Skip to content

Cloudflare Deployment

This guide covers deploying Corpus to Cloudflare Workers using D1 for metadata and R2 for data storage.

Prerequisites

Cloudflare Account
Workers enabled on your account
Wrangler CLI
Install with npm install -g wrangler

Setup

1
Create D1 database and R2 bucket
Terminal window
wrangler d1 create corpus-db
wrangler r2 bucket create corpus-bucket
2
Set up database migrations

Install Drizzle Kit as a dev dependency:

Terminal window
bun add -D drizzle-kit

Create drizzle.config.ts in your project root:

import { defineConfig } from 'drizzle-kit'
export default defineConfig({
dialect: 'sqlite',
driver: 'd1-http',
schema: [
'./node_modules/@f0rbit/corpus/schema.js',
'./node_modules/@f0rbit/corpus/observations/schema.js',
],
out: './migrations',
})

Generate and apply migrations:

Terminal window
# Generate migration files from corpus schemas
bunx drizzle-kit generate
# Apply migrations to your D1 database
wrangler d1 migrations apply corpus-db
3
Configure wrangler.toml
name = "my-worker"
main = "src/index.ts"
compatibility_date = "2024-01-01"
[[d1_databases]]
binding = "CORPUS_DB"
database_name = "corpus-db"
database_id = "<your-database-id>"
[[r2_buckets]]
binding = "CORPUS_BUCKET"
bucket_name = "corpus-bucket"
4
Create your Worker
import { z } from 'zod'
import {
create_corpus,
create_cloudflare_backend,
define_store,
json_codec
} from '@f0rbit/corpus/cloudflare'
const CacheSchema = z.object({
key: z.string(),
value: z.unknown(),
ttl: z.number().optional(),
})
interface Env {
CORPUS_DB: D1Database
CORPUS_BUCKET: R2Bucket
}
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const backend = create_cloudflare_backend({
d1: env.CORPUS_DB,
r2: env.CORPUS_BUCKET,
})
const corpus = create_corpus()
.with_backend(backend)
.with_store(define_store('cache', json_codec(CacheSchema)))
.build()
// Example: Store a cache entry
const result = await corpus.stores.cache.put({
key: 'greeting',
value: 'Hello from Cloudflare!',
})
if (!result.ok) {
return new Response(JSON.stringify(result.error), { status: 500 })
}
return new Response(JSON.stringify({
version: result.value.version,
hash: result.value.content_hash,
}))
},
}
5
Deploy
Terminal window
wrangler deploy

SST Integration

If you’re using SST for infrastructure as code, Corpus provides helper functions:

sst.config.ts
import { createCorpusInfra } from '@f0rbit/corpus'
const corpus = createCorpusInfra('myapp')
const db = new sst.cloudflare.D1(corpus.database.name)
const bucket = new sst.cloudflare.Bucket(corpus.bucket.name)
// Creates resources: 'myappDb' and 'myappBucket'
SST Integration Guide
Full API reference, configuration options, and Drizzle migration examples

Using the Cloudflare Entry Point

The @f0rbit/corpus/cloudflare entry point excludes the file backend (which requires Node.js APIs) for smaller bundle sizes in Workers:

Performance Tips

Batch Operations

When storing multiple items, consider using a layered backend with memory caching:

const cache = create_memory_backend()
const cf = create_cloudflare_backend({ d1: env.DB, r2: env.BUCKET })
const backend = create_layered_backend({
read: [cache, cf], // Check cache first
write: [cache, cf], // Write to both
})

Content Deduplication

Corpus automatically deduplicates content by hash. If you store the same data twice:

  • Two metadata entries are created (different versions)
  • Only one copy of the data is stored in R2
  • The data_key in metadata points to the shared blob

Minimize Cold Starts

Create the corpus once and reuse it across requests:

let corpus: ReturnType<typeof create_corpus>['build'] | null = null
function getCorpus(env: Env) {
if (corpus) return corpus
const backend = create_cloudflare_backend({
d1: env.CORPUS_DB,
r2: env.CORPUS_BUCKET,
})
corpus = create_corpus()
.with_backend(backend)
.with_store(define_store('data', json_codec(DataSchema)))
.build()
return corpus
}

Error Handling

Always check the Result type from operations:

const result = await corpus.stores.cache.get(version)
if (!result.ok) {
switch (result.error.kind) {
case 'not_found':
return new Response('Not found', { status: 404 })
case 'storage_error':
console.error('D1/R2 error:', result.error.cause)
return new Response('Storage error', { status: 500 })
default:
return new Response('Error', { status: 500 })
}
}
return new Response(JSON.stringify(result.value.data))

See Also