Rate Limiting Guide
Protect your WebSocket server from abuse and control resource usage with atomic, distributed rate limiting. WS-Kit's adapter-first approach lets you seamlessly switch between single-instance and multi-pod deployments without changing your application code.
Concepts
Token Bucket Algorithm
Rate limiting uses the token bucket algorithm:
- Bucket: Each rate limit key (user, message type, etc.) has a bucket
- Tokens: The bucket contains tokens (initial = capacity)
- Consumption: Each message consumes 1 or more tokens
- Refill: Tokens are added over time at a constant rate (
tokensPerSecond) - Limit: If insufficient tokens, the request is blocked and backoff time is computed
Example: Capacity=10, tokensPerSecond=1
- Initial: 10 tokens available
- After 1 second: 1 new token added (max 10)
- Each request consumes 1 token
- When empty, clients must wait ~1 second per token
Atomicity Guarantees
Rate limiting is atomic per adapter, ensuring no double-spending:
- Memory: Per-key FIFO mutex (single-instance safe)
- Redis: Lua script (multi-pod safe)
- Durable Objects: Single-threaded per shard (Cloudflare safe)
Server-Authoritative Time
Always uses server time (never client time):
- Memory adapter:
Date.now() - Redis adapter:
redis.call('TIME') - Durable Objects: Server clock
This prevents clients from bypassing limits via clock manipulation.
Quick Start
Single-Instance (Development)
import { z, createRouter } from "@ws-kit/zod";
import { serve } from "@ws-kit/bun";
import { rateLimit, keyPerUserPerType } from "@ws-kit/middleware";
import { memoryRateLimiter } from "@ws-kit/adapters/memory";
const router = createRouter();
// Apply rate limiting to all messages
const limiter = rateLimit({
limiter: memoryRateLimiter({
capacity: 200, // Max 200 tokens per bucket
tokensPerSecond: 100, // Add 100 tokens every second
}),
key: keyPerUserPerType, // Per-user per-message-type buckets
});
router.use(limiter);
serve(router, { port: 3000 });Multi-Pod (Production with Redis)
import { createClient } from "redis";
import { rateLimit } from "@ws-kit/middleware";
import { redisRateLimiter } from "@ws-kit/adapters/redis";
const redisClient = createClient({ url: process.env.REDIS_URL });
await redisClient.connect();
const limiter = rateLimit({
limiter: redisRateLimiter(redisClient, {
capacity: 200,
tokensPerSecond: 100,
}),
key: keyPerUserPerType,
});
router.use(limiter);When rate limited, clients receive RESOURCE_EXHAUSTED error with retryAfterMs backoff hint.
Adapters
Memory Adapter
Use when: Single-instance deployment (dev, single Bun server, Node.js)
import { memoryRateLimiter } from "@ws-kit/adapters/memory";
const limiter = memoryRateLimiter({
capacity: 10,
tokensPerSecond: 1,
});Features:
- Zero external dependencies
- Per-key FIFO mutex ensures atomicity
- Optional clock injection for testing
Limitations:
- In-memory only (not shared across processes)
- No automatic cleanup (use for single-instance apps or external sweepers)
Testing with injected clock:
const fakeTime = { current: Date.now() };
const limiter = memoryRateLimiter(
{ capacity: 10, tokensPerSecond: 1 },
{ clock: { now: () => fakeTime.current } },
);
// Advance time for deterministic testing
fakeTime.current += 5000; // Move forward 5 seconds
const result = await limiter.consume("user:1", 1);
// Bucket refilled 5 tokens, request succeedsRedis Adapter
Use when: Multi-pod deployment or shared state across servers
import { createClient } from "redis";
import { redisRateLimiter } from "@ws-kit/adapters/redis";
const client = createClient({ url: process.env.REDIS_URL });
await client.connect();
const limiter = redisRateLimiter(client, {
capacity: 200,
tokensPerSecond: 100,
});Features:
- Distributed: Works across multiple servers/pods
- Lua script ensures atomicity (single Redis operation)
- Automatic TTL: Keys expire after 2x refill window
- Connection pooling: Reuse single Redis connection
Multiple policies with shared connection:
const cheap = redisRateLimiter(client, {
capacity: 200,
tokensPerSecond: 100,
prefix: "cheap:", // Separate key namespace
});
const expensive = redisRateLimiter(client, {
capacity: 10,
tokensPerSecond: 2,
prefix: "expensive:",
});
// Both share the same Redis connection, no overheadCustom TTL:
const limiter = redisRateLimiter(
client,
{ capacity: 10, tokensPerSecond: 1 },
{ ttlMs: 120_000 }, // 2 minutes (default auto-calculated)
);Cloudflare Durable Objects Adapter
Use when: Cloudflare Workers with persistent coordination needs
import { durableObjectRateLimiter } from "@ws-kit/adapters/cloudflare-do";
const limiter = durableObjectRateLimiter(env.RATE_LIMITER, {
capacity: 200,
tokensPerSecond: 100,
});Features:
- Single-threaded per shard (atomic by design)
- Sharded across 128 DOs by default (configurable)
- Mark-and-sweep cleanup (hourly, 24h TTL)
Custom shard count:
const limiter = durableObjectRateLimiter(
env.RATE_LIMITER,
{ capacity: 200, tokensPerSecond: 100 },
{ shards: 256 }, // Use 256 shards instead of default 128
);Key Functions
Rate limit keys determine the isolation boundary. Choose based on your fairness model.
keyPerUserPerType (Recommended)
One bucket per (tenant, user, message type). Prevents one operation from starving others.
import { keyPerUserPerType } from "@ws-kit/middleware";
const limiter = rateLimit({
limiter,
key: keyPerUserPerType,
});
// Key format: "rl:{tenantId}:{userId}:{type}"
// Examples:
// "rl:public:user_123:SEND_MESSAGE"
// "rl:acme:user_456:COMPUTE"When to use:
- Most applications
- Fair isolation across message types
- Typical cardinality: 5-30 types × 10k users = 150k buckets (acceptable)
perUserKey (Lighter Footprint)
One bucket per (tenant, user). Use cost() to weight operations.
import { perUserKey } from "@ws-kit/middleware";
const limiter = rateLimit({
limiter: memoryRateLimiter({ capacity: 200, tokensPerSecond: 100 }),
key: perUserKey,
cost: (ctx) => (ctx.type === "ExpensiveOp" ? 10 : 1),
});
// Key format: "rl:{tenantId}:{userId}"
// All message types share the same budgetWhen to use:
- High-type-count apps (100+ distinct message types)
- Memory-constrained deployments
- Acceptable to weight operations within shared budget
keyPerUserOrIpPerType (Future IP Support)
Per-user for authenticated, IP fallback for anonymous. Currently falls back to "anon" (IP not available at middleware layer).
import { keyPerUserOrIpPerType } from "@ws-kit/middleware";
const limiter = rateLimit({
limiter,
key: keyPerUserOrIpPerType,
});
// Key format: "rl:{tenantId}:{userId|ip|anon}:{type}"
// Note: IP is not available at middleware layer; uses "anon" for all unauthenticatedWhen to use:
- Apps with primarily authenticated users
- Designed for future router-level integration (which has access to IP)
Custom Key Functions
Define custom logic for other isolation strategies:
const limiter = rateLimit({
limiter,
key: (ctx) => {
const userId = ctx.ws.data?.userId;
const tier = ctx.ws.data?.tier ?? "free";
return `rl:${tier}:${userId}:${ctx.type}`;
},
});
// Separate buckets per tier, allowing per-tier rate limitsSafe context fields for key functions:
ctx.type— Message typectx.id— Connection IDctx.ip— Client IP (empty at middleware layer; use router integration)ctx.ws.data— Connection data from authenticate()ctx.meta.receivedAt— Server timestamp
Unsafe fields (not available before schema validation):
ctx.payload— Not schema-validated yet
Cost Functions
Control token cost per message. Must return a positive integer.
Default Cost (1 Token per Message)
const limiter = rateLimit({
limiter,
key: keyPerUserPerType,
// cost defaults to 1
});Weighted by Operation Type
const limiter = rateLimit({
limiter,
key: keyPerUserPerType,
cost: (ctx) => {
if (ctx.type === "Compute") return 10; // Expensive
if (ctx.type === "Query") return 1; // Cheap
return 2; // Default
},
});Weighted by User Tier
Use separate limiters for separate tiers instead of variable cost:
// Free tier: stricter limit
const freeLimiter = rateLimit({
limiter: memoryRateLimiter({ capacity: 10, tokensPerSecond: 1 }),
key: keyPerUserPerType,
});
// Premium tier: generous limit
const premiumLimiter = rateLimit({
limiter: memoryRateLimiter({ capacity: 1000, tokensPerSecond: 500 }),
key: keyPerUserPerType,
});
// Route-specific middleware selection
router.use((ctx, next) => {
if (ctx.ws.data?.isPremium) {
return premiumLimiter(ctx, next);
}
return freeLimiter(ctx, next);
});Validation (Runtime Checks)
Cost must be a positive integer. Non-integers or non-positive values are rejected:
// ✅ Valid: positive integers
cost: (ctx) => 1;
cost: (ctx) => (ctx.type === "Expensive" ? 5 : 1);
// ❌ Invalid: rejected with INVALID_ARGUMENT error
cost: (ctx) => 0.5; // Non-integer
cost: (ctx) => 0; // Zero
cost: (ctx) => -1; // Negative
cost: (ctx) => Math.random(); // UnpredictableMultiple Policies
Run independent limiters with different policies on the same connection.
Different Budgets for Different Operations
import { rateLimit, keyPerUserPerType } from "@ws-kit/middleware";
import { memoryRateLimiter } from "@ws-kit/adapters/memory";
// Cheap operations: generous limit
const cheapLimiter = rateLimit({
limiter: memoryRateLimiter({ capacity: 200, tokensPerSecond: 100 }),
key: keyPerUserPerType,
cost: (ctx) => 1,
});
// Expensive operations: strict limit
const expensiveLimiter = rateLimit({
limiter: memoryRateLimiter({ capacity: 10, tokensPerSecond: 2 }),
key: keyPerUserPerType,
cost: (ctx) => 5,
});
router.use(cheapLimiter);
router.use(expensiveLimiter);When rate limited by either policy, clients get RESOURCE_EXHAUSTED with appropriate retryAfterMs.
Multi-Pod with Shared Connection
import { redisRateLimiter } from "@ws-kit/adapters/redis";
const redisClient = createClient({ url: process.env.REDIS_URL });
await redisClient.connect();
// Multiple limiters from same client (zero overhead)
const fast = redisRateLimiter(redisClient, {
capacity: 500,
tokensPerSecond: 250,
prefix: "fast:",
});
const slow = redisRateLimiter(redisClient, {
capacity: 10,
tokensPerSecond: 1,
prefix: "slow:",
});
router.use(rateLimit({ limiter: fast, key: keyPerUserPerType }));
router.use(rateLimit({ limiter: slow, key: keyPerUserPerType }));Testing
Deterministic Testing (Memory Adapter)
Use clock injection for time-travel testing without real delays:
import { test } from "bun:test";
import { memoryRateLimiter } from "@ws-kit/adapters/memory";
test("rate limit refill", async () => {
const fakeTime = { current: Date.now() };
const limiter = memoryRateLimiter(
{ capacity: 10, tokensPerSecond: 1 },
{ clock: { now: () => fakeTime.current } },
);
// Exhaust tokens
for (let i = 0; i < 10; i++) {
const result = await limiter.consume("user:1", 1);
expect(result.allowed).toBe(true);
}
// 11th request fails
let result = await limiter.consume("user:1", 1);
expect(result.allowed).toBe(false);
expect(result.retryAfterMs).toBeGreaterThan(0);
// Advance time by 5 seconds
fakeTime.current += 5000;
// 5 new tokens added, request succeeds
result = await limiter.consume("user:1", 1);
expect(result.allowed).toBe(true);
expect(result.remaining).toBe(4);
});Integration Testing
Use real middleware with test utilities:
import { z, createRouter } from "@ws-kit/zod";
import { rateLimit, keyPerUserPerType } from "@ws-kit/middleware";
import { memoryRateLimiter } from "@ws-kit/adapters/memory";
test("rate limit integration", async () => {
const SendMsg = message("SEND", { text: z.string() });
const router = createRouter<{ userId: string }>();
const limiter = rateLimit({
limiter: memoryRateLimiter({ capacity: 2, tokensPerSecond: 1 }),
key: keyPerUserPerType,
});
router.use(limiter);
router.on(SendMsg, (ctx) => {
ctx.send(SendMsg, { text: "ok" });
});
// Create mock connection
const mockContext = {
type: "SEND",
ws: { data: { userId: "user:1" } },
// ... other context fields
};
// First 2 messages succeed
await limiter(mockContext, () => Promise.resolve());
await limiter(mockContext, () => Promise.resolve());
// 3rd message fails (rate limited)
await expect(limiter(mockContext, () => Promise.resolve())).rejects.toThrow(
"Rate limit exceeded",
);
});Common Patterns
Per-Route Rate Limiting
Apply different limits to specific message types:
const globalLimiter = rateLimit({
limiter: memoryRateLimiter({ capacity: 100, tokensPerSecond: 50 }),
key: keyPerUserPerType,
});
const strictLimiter = rateLimit({
limiter: memoryRateLimiter({ capacity: 5, tokensPerSecond: 1 }),
key: keyPerUserPerType,
});
router.use(globalLimiter);
// Apply stricter limit to expensive operation
router.on(ExpensiveOp, (ctx, next) => {
// This runs after global limit, applying additional restriction
// Can use middleware composition for selective application
return next();
});Tiered Rate Limits
Different limits per subscription tier:
const freeLimiter = rateLimit({
limiter: memoryRateLimiter({ capacity: 20, tokensPerSecond: 5 }),
key: keyPerUserPerType,
});
const premoLimiter = rateLimit({
limiter: memoryRateLimiter({ capacity: 200, tokensPerSecond: 100 }),
key: keyPerUserPerType,
});
const proLimiter = rateLimit({
limiter: memoryRateLimiter({ capacity: 2000, tokensPerSecond: 1000 }),
key: keyPerUserPerType,
});
router.use((ctx, next) => {
const tier = ctx.ws.data?.tier ?? "free";
if (tier === "pro") {
return proLimiter(ctx, next);
} else if (tier === "premo") {
return premiumLimiter(ctx, next);
}
return freeLimiter(ctx, next);
});Metrics and Observability
serve(router, {
port: 3000,
onLimitExceeded(info) {
if (info.type === "rate") {
metrics.increment("rate_limit.exceeded", {
client_id: info.clientId,
cost: info.observed,
capacity: info.limit,
retryable: info.retryAfterMs !== null,
});
logger.warn("Rate limit exceeded", {
clientId: info.clientId,
observed: info.observed,
limit: info.limit,
retryAfterMs: info.retryAfterMs,
});
}
},
});Troubleshooting
"Rate limit exceeded" errors in tests
Problem: Tests are failing with rate limit errors unexpectedly.
Solution: Use clock injection with memory adapter for deterministic testing:
const fakeTime = { current: Date.now() };
const limiter = memoryRateLimiter(policy, {
clock: { now: () => fakeTime.current },
});
// Advance time as needed
fakeTime.current += 1000; // Add 1 secondIP-based rate limiting not working
Problem: All unauthenticated users share "anon" bucket regardless of IP.
Reason: IP is not available at middleware layer (runs post-validation). The proposal specifies step 3 (pre-validation) for IP access.
Solution Options:
- Use authentication (recommended) — Rate limit by user ID
- Use custom key function with other identifiers (connection ID, session)
- Wait for router-level rate limiting integration (future)
// Workaround: Use connection ID per-type
key: (ctx) => `rl:${ctx.id}:${ctx.type}`;
// Note: This is per-connection, not per-IP (won't prevent distributed attacks)Redis key growth unbounded
Problem: Rate limit keys in Redis keep growing.
Solution: Redis adapter automatically sets TTL (2x refill window, minimum 60s). Idle keys expire automatically:
const limiter = redisRateLimiter(client, {
capacity: 10,
tokensPerSecond: 1,
// TTL auto-calculated: max(2*10/1*1000, 60000) = 60000ms = 60s
});
// Custom TTL if needed
const limiter = redisRateLimiter(
client,
{ capacity: 10, tokensPerSecond: 1 },
{ ttlMs: 120_000 }, // 2 minutes
);Cost > capacity always fails
Problem: Operations with cost > capacity always return retryAfterMs: null (non-retryable).
Reason: This is by design. If cost exceeds capacity, the operation can never succeed, even with infinite time.
Solution: Increase capacity or decrease cost:
// Before (impossible):
// limiter: { capacity: 5, tokensPerSecond: 1 }
// cost: (ctx) => ctx.type === "Expensive" ? 10 : 1
// ❌ Expensive operations never succeed
// After (possible):
// limiter: { capacity: 20, tokensPerSecond: 10 }
// cost: (ctx) => ctx.type === "Expensive" ? 10 : 1
// ✅ Expensive operations succeed once per 2 seconds on averageInconsistent limits across pods
Problem: Each pod has separate memory limits; users see different limits.
Solution: Use Redis adapter for distributed coordination:
// ❌ Wrong: Each pod has separate limits
const limiter = memoryRateLimiter({ capacity: 100, tokensPerSecond: 50 });
// ✅ Correct: Shared across all pods via Redis
const limiter = redisRateLimiter(redisClient, {
capacity: 100,
tokensPerSecond: 50,
});