Cloudflare Deployment
This guide covers deploying Corpus to Cloudflare Workers using D1 for metadata and R2 for data storage.
Prerequisites
Cloudflare Account
Workers enabled on your account
Wrangler CLI
Install with
npm install -g wranglerSetup
1
Create D1 database and R2 bucket
wrangler d1 create corpus-dbwrangler r2 bucket create corpus-bucket2
Set up database migrations
Install Drizzle Kit as a dev dependency:
bun add -D drizzle-kitCreate drizzle.config.ts in your project root:
import { defineConfig } from 'drizzle-kit'
export default defineConfig({ dialect: 'sqlite', driver: 'd1-http', schema: [ './node_modules/@f0rbit/corpus/schema.js', './node_modules/@f0rbit/corpus/observations/schema.js', ], out: './migrations',})Generate and apply migrations:
# Generate migration files from corpus schemasbunx drizzle-kit generate
# Apply migrations to your D1 databasewrangler d1 migrations apply corpus-db3
Configure wrangler.toml
name = "my-worker"main = "src/index.ts"compatibility_date = "2024-01-01"
[[d1_databases]]binding = "CORPUS_DB"database_name = "corpus-db"database_id = "<your-database-id>"
[[r2_buckets]]binding = "CORPUS_BUCKET"bucket_name = "corpus-bucket"4
Create your Worker
import { z } from 'zod'import { create_corpus, create_cloudflare_backend, define_store, json_codec} from '@f0rbit/corpus/cloudflare'
const CacheSchema = z.object({ key: z.string(), value: z.unknown(), ttl: z.number().optional(),})
interface Env { CORPUS_DB: D1Database CORPUS_BUCKET: R2Bucket}
export default { async fetch(request: Request, env: Env): Promise<Response> { const backend = create_cloudflare_backend({ d1: env.CORPUS_DB, r2: env.CORPUS_BUCKET, })
const corpus = create_corpus() .with_backend(backend) .with_store(define_store('cache', json_codec(CacheSchema))) .build()
// Example: Store a cache entry const result = await corpus.stores.cache.put({ key: 'greeting', value: 'Hello from Cloudflare!', })
if (!result.ok) { return new Response(JSON.stringify(result.error), { status: 500 }) }
return new Response(JSON.stringify({ version: result.value.version, hash: result.value.content_hash, })) },}5
Deploy
wrangler deploySST Integration
If you’re using SST for infrastructure as code, Corpus provides helper functions:
import { createCorpusInfra } from '@f0rbit/corpus'
const corpus = createCorpusInfra('myapp')
const db = new sst.cloudflare.D1(corpus.database.name)const bucket = new sst.cloudflare.Bucket(corpus.bucket.name)
// Creates resources: 'myappDb' and 'myappBucket'SST Integration Guide
Full API reference, configuration options, and Drizzle migration examples
Using the Cloudflare Entry Point
The @f0rbit/corpus/cloudflare entry point excludes the file backend (which requires Node.js APIs) for smaller bundle sizes in Workers:
// Smaller bundle, Workers-compatibleimport { create_cloudflare_backend } from '@f0rbit/corpus/cloudflare'// Full package with all backendsimport { create_cloudflare_backend } from '@f0rbit/corpus'Performance Tips
Batch Operations
When storing multiple items, consider using a layered backend with memory caching:
const cache = create_memory_backend()const cf = create_cloudflare_backend({ d1: env.DB, r2: env.BUCKET })
const backend = create_layered_backend({ read: [cache, cf], // Check cache first write: [cache, cf], // Write to both})Content Deduplication
Corpus automatically deduplicates content by hash. If you store the same data twice:
- Two metadata entries are created (different versions)
- Only one copy of the data is stored in R2
- The
data_keyin metadata points to the shared blob
Minimize Cold Starts
Create the corpus once and reuse it across requests:
let corpus: ReturnType<typeof create_corpus>['build'] | null = null
function getCorpus(env: Env) { if (corpus) return corpus
const backend = create_cloudflare_backend({ d1: env.CORPUS_DB, r2: env.CORPUS_BUCKET, })
corpus = create_corpus() .with_backend(backend) .with_store(define_store('data', json_codec(DataSchema))) .build()
return corpus}Error Handling
Always check the Result type from operations:
const result = await corpus.stores.cache.get(version)
if (!result.ok) { switch (result.error.kind) { case 'not_found': return new Response('Not found', { status: 404 }) case 'storage_error': console.error('D1/R2 error:', result.error.cause) return new Response('Storage error', { status: 500 }) default: return new Response('Error', { status: 500 }) }}
return new Response(JSON.stringify(result.value.data))