Redis Backend Specification
This document defines Redis-specific implementation requirements that extend the common interface specification.
🚫 CRITICAL: Never Delete Fence Counters
Fence counter keys (
{prefix}:fence:{key}) MUST NEVER be deleted or assigned TTL. Deleting fence counters breaks monotonicity guarantees and violates fencing safety. Cleanup operations MUST only target lock data keys (main lock and reverse index), never fence counters.
Document Structure
This specification uses a normative vs rationale pattern:
- Requirements sections contain MUST/SHOULD/MAY/NEVER statements defining the contract
- Rationale & Notes sections provide background, design decisions, and operational guidance
Dual-Key Storage Pattern
Requirements
Main lock key: {keyPrefix}:{key} stores JSON lock data Index key: {keyPrefix}:id:{lockId} stores full lockKey for reverse lookup (ADR-013)
- Both keys MUST have identical TTL for consistency
- Use
redis.call('SET', key, data, 'PX', ttlMs)for atomic set-with-TTL - Storage Key Generation: MUST call
makeStorageKey()from common utilities (see Storage Key Generation) - Backend-specific limit:
EFFECTIVE_KEY_BUDGET_BYTES = 1000(chosen for predictable memory/ops headroom; not a Redis hard limit) - Reserve Bytes Requirement: Redis operations MUST reserve 26 bytes for derived keys when calling
makeStorageKey():- Formula:
":id:" (4 bytes) + lockId (22 bytes) = 26 bytes - Purpose: Ensures derived keys like
${prefix}:id:${lockId}fit within the effective budget
- Formula:
Rationale & Notes
Why dual-key pattern: Enables fast reverse lookup (lockId → key) required for release/extend operations. Single-key approaches would require scanning or secondary indexes.
Why identical TTL: Prevents orphaned index keys. If index outlives lock, lookup returns stale data. If lock outlives index, release/extend fail incorrectly.
Why 26-byte reserve: Redis constructs multiple keys from base (main lock + reverse index). Reserve ensures all derived keys fit within the effective budget when base key is at maximum length.
Script Caching for Performance
Requirements
- Primary approach: Use ioredis
defineCommand()for automatic EVALSHA caching - Fallback: Graceful fallback to
redis.eval()for test mocks - Scripts defined once during backend initialization for optimal performance
- ioredis handles SCRIPT LOAD + EVALSHA + NOSCRIPT error recovery automatically
Rationale & Notes
Why defineCommand: ioredis manages the entire caching lifecycle automatically. First call loads script via SCRIPT LOAD, subsequent calls use EVALSHA. If Redis restarts and loses scripts, ioredis automatically reloads via NOSCRIPT error handling.
Why fallback to eval: Test mocks often don't implement full Redis command set. Fallback enables unit testing without external dependencies.
Performance impact: EVALSHA reduces network overhead from ~1KB (full script) to ~40 bytes (SHA hash). At 10K ops/sec, saves ~9.6MB/sec network bandwidth.
Lua Scripts for Atomicity
Requirements
- ALL mutating operations MUST use Lua scripts
- Scripts centralized in
redis/scripts.tswith descriptive comments - Use
cjson.decode()/cjson.encode()for JSON handling in Lua - Script return codes MUST follow operation-specific semantics (see Script Return Code Semantics for complete mapping)
- Scripts handle lock contention via return codes, backend throws LockError for Redis errors
- Pass all required data as KEYS and ARGV to avoid closure issues
Rationale & Notes
Why Lua scripts: Redis Lua scripts execute atomically. Provides ACID guarantees without complex client-side transaction logic.
Why centralized scripts: Single source of truth. Prevents script duplication and drift across operations.
Why KEYS/ARGV pattern: Lua closures over external variables can cause subtle bugs. Explicit parameters make data flow visible and testable.
Explicit Ownership Verification (ADR-003)
Requirements
ALL Redis scripts MUST include explicit ownership verification:
-- After loading lock data
if data.lockId ~= lockId then return 0 end -- Ownership mismatchThis verification is MANDATORY even when using atomic scripts.
Rationale & Notes
Why required despite atomicity: Defense-in-depth. While atomic scripts prevent most race conditions, explicit verification guards against:
- Stale reverse mappings: Cleanup race conditions where index key exists but points to wrong lock
- Cross-backend consistency: Both Redis and Firestore must implement identical ownership checking
- Security requirement: Prevents wrong-lock mutations in all scenarios (ADR-003 compliance)
Edge case example: Index key survives TTL cleanup due to timing window. Without explicit verification, release could affect unrelated lock that reused the same key.
Time Authority & Liveness Predicate
Requirements
MUST use unified liveness predicate from common/time-predicates.ts:
import {
isLive,
calculateRedisServerTimeMs,
TIME_TOLERANCE_MS,
} from "../common/time-predicates.js";
const serverTimeMs = calculateRedisServerTimeMs(await redis.call("TIME"));
const live = isLive(storedExpiresAtMs, serverTimeMs, TIME_TOLERANCE_MS);Time Authority Model: Redis uses server time via redis.call('TIME') (ADR-005).
Rationale & Notes
Why server time: All lock operations use a single authoritative time source, eliminating client clock skew issues.
Server Time Reliability:
- Single source of truth: All clients query the same Redis server time for consistency
- No NTP requirements: Client clock accuracy is irrelevant for lock operations
- Predictable behavior: Lock liveness checks are deterministic across all clients
- High consistency: Eliminates race conditions caused by multi-client clock skew (unlike Firestore's client-time model)
Unified Tolerance: See TIME_TOLERANCE_MS in interface.md for normative tolerance specification.
Operational Considerations: See Time Authority Tradeoffs for:
- When to choose Redis vs Firestore based on time authority requirements
- Pre-production checklists and production monitoring guidance
- Failure scenarios and mitigation strategies for server time authority
- When Redis server time might fail (e.g., Redis cluster clock sync issues)
Backend Capabilities and Type Safety
Requirements
Redis backends MUST declare their specific capabilities for enhanced type safety:
interface RedisCapabilities extends BackendCapabilities {
backend: "redis"; // Backend type discriminant
supportsFencing: true; // Redis always provides fencing tokens
timeAuthority: "server"; // Uses Redis server time
}
const redisBackend: LockBackend<RedisCapabilities> = createRedisBackend(config);Rationale & Notes
Ergonomic Usage: Since Redis always supports fencing, TypeScript provides compile-time guarantees:
const result = await redisBackend.acquire({ key: "resource", ttlMs: 30000 });
if (result.ok) {
result.fence; // No assertion needed - TypeScript knows this exists
}Type discriminant benefits: Enables pattern matching and type-safe backend switching in generic code.
Script Implementation Patterns
NORMATIVE IMPLEMENTATION: See redis/scripts.ts for canonical Lua scripts with inline documentation.
Script Characteristics
Scripts implement these atomic operation patterns:
- Acquire: Expiry check → fence increment → dual-key write with identical TTL
- Release/Extend: Reverse lookup (ADR-013) → ownership verification (ADR-003) → mutation
- Lookup: Atomic multi-key read with ownership verification
All scripts use:
- Canonical time calculation:
time[1] * 1000 + math.floor(time[2] / 1000)(seecalculateRedisServerTimeMs()) - Unified liveness predicate:
isLive()pattern fromcommon/time-predicates.ts - 15-digit fence formatting:
string.format("%015d", redis.call('INCR', fenceKey))for Lua precision safety
Script Return Code Semantics
Backend operations use these standardized return codes for internal condition tracking:
Acquire:
0→ contention (lock held by another process)[1, fence, expiresAtMs]→ success with fence token and authoritative server-time expiry
Release:
1→ success0→ ownership mismatch (ADR-003)-1→ not found-2→ expired
Extend:
[1, newExpiresAtMs]→ success with authoritative server-time expiry0→ ownership mismatch (ADR-003)-1→ not found-2→ expired
Rationale & Notes
Why centralized in redis/scripts.ts: Single source of truth prevents script duplication and drift. See module for KEYS/ARGV signatures and complete implementation details.
Why return codes: Enables cheap internal condition tracking for telemetry without additional I/O. Public API simplifies to { ok: boolean } for release/extend operations.
Error Handling
Requirements
MUST follow common spec ErrorMappingStandard.
Key Redis mappings:
- ServiceUnavailable: Connection errors (
ECONNRESET,ENOTFOUND,ECONNREFUSED) - AuthFailed:
NOAUTH,WRONGPASS,NOPERM - InvalidArgument:
WRONGTYPE,SYNTAX,INVALID - NetworkTimeout: Client/operation timeouts
- Aborted: Operation cancelled via AbortSignal
Implementation Pattern:
import {
isLive,
calculateRedisServerTimeMs,
} from "../common/time-predicates.js";
// Release/extend operations use script return codes
const scriptReturnCode = await redis.evalsha(/* script */);
// Public API: simplified boolean result
const success = scriptReturnCode === 1;
// Internal detail tracking (best-effort, for decorator consumption if telemetry enabled)
const detail = !success
? scriptReturnCode === -2
? "expired"
: "not-found"
: undefined;
return { ok: success };Release/Extend Script Return Codes:
1→ success0→ ownership mismatch (ADR-003) → internal: "not-found"-1→ never existed/cleaned up → internal: "not-found"-2→ deterministically observed expired → internal: "expired"
Rationale & Notes
Why return codes instead of error strings: More efficient. Numbers are cheaper to parse than strings in Lua/JSON.
Why track internal details: Enables rich telemetry when decorator is enabled, without cluttering public API.
TTL Management
Requirements
- Use milliseconds directly with PX:
ttlMs - Use Redis
PXoption, not separateEXPIREcalls - Cleanup Configuration: Optional
cleanupInIsLocked: boolean(default:false) - when enabled, allows fire-and-forget cleanup in isLocked operation- CRITICAL: Cleanup MUST ONLY delete lock data keys (main lock key and reverse index key), NEVER fence counter keys
- Configuration Validation: Backend MUST validate at initialization that
keyPrefixconfiguration does not create overlap between lock data and fence counter namespaces - If misconfiguration could result in fence counter deletion, backend MUST throw
LockError("InvalidArgument")with descriptive message
Rationale & Notes
Why PX not EXPIRE: Single atomic operation. SET key value PX ttl is atomic; SET then EXPIRE creates race window.
Why validate fence counter namespace: Prevents catastrophic bugs where cleanup accidentally deletes fence counters, breaking monotonicity guarantees.
Operation-Specific Behavior
Acquire Operation Requirements
Direct script return mapping (not semantic helper):
| Script Return | Backend Result |
|---|---|
0 | { ok: false, reason: "locked" } (contention) |
[1, fence, expiresAtMs] | { ok: true, lockId, expiresAtMs, fence } (success) |
- MUST return authoritative expiresAtMs: Computed from Redis server time authority (
redis.call('TIME')) to ensure consistency and accurate heartbeat scheduling. No client-side approximation allowed (see ADR-010). - System Errors: Backend throws
LockErrorfor Redis connection/command failures - Single-attempt operations: Redis backends perform single attempts only; retry logic is handled by the lock() helper
Release Operation Requirements
- LockId Validation: MUST call
validateLockId(lockId)and throwLockError("InvalidArgument")on malformed input - MUST implement TOCTOU Protection via atomic Lua scripts. Return simplified
{ ok: boolean }results. Track internal details cheaply when available for potential telemetry decorator consumption.
Extend Operation Requirements
- LockId Validation: MUST call
validateLockId(lockId)and throwLockError("InvalidArgument")on malformed input - MUST return authoritative expiresAtMs: Computed from Redis server time authority (
redis.call('TIME')) to ensure consistency and accurate heartbeat scheduling. No client-side approximation allowed (see ADR-010). - MUST implement TOCTOU Protection via atomic Lua scripts. TTL semantics: replaces remaining TTL entirely (
now + ttlMs).
IsLocked Operation Requirements
- Use Case: Simple boolean checks (prefer
lookup()for diagnostics) - Locked/Unlocked: Backend returns
true/falsebased on key existence and expiry - Read-Only by Default: Cleanup disabled by default to maintain pure read semantics
- Optional Cleanup: When
cleanupInIsLocked: trueconfigured, MAY perform fire-and-forget cleanup following common spec guidelines - System Errors: Backend throws
LockErrorfor Redis failures
Lookup Operation Requirements
Runtime Validation: MUST validate inputs before any I/O operations:
- Key mode: Call
normalizeAndValidateKey(key)and fail fast on invalid keys - LockId mode: Call
validateLockId(lockId)and throwLockError("InvalidArgument")on malformed input
Key Lookup Mode:
- Implementation: Direct access to main lock key:
redis.call('GET', lockKey) - Complexity: O(1) direct access (single operation)
- Atomicity: Single GET operation (inherently atomic)
- Performance: Direct key-value access, sub-millisecond latency
LockId Lookup Mode:
- Implementation: MUST use atomic Lua script that reads reverse index and main lock in single operation
- Complexity: Multi-step (reverse mapping + verification)
- Atomicity: MUST be atomic via Lua script to prevent TOCTOU races
- Performance: Atomic script execution, consistent sub-millisecond performance
Common Requirements:
- Ownership Verification: Script MUST verify
data.lockId === lockIdafter parsing lock data - Expiry Check: Parse JSON and apply
isLive()predicate usingcalculateRedisServerTimeMs()andTIME_TOLERANCE_MS - Data Transformation Requirement: Lua script returns full JSON lockData. TypeScript lookup method MUST parse, compute keyHash and lockIdHash using
hashKey(), and return sanitizedLockInfo<C> - ⚠️ FORBIDDEN: Raw Data Pass-Through: TypeScript layer MUST sanitize all data before returning. Raw Lua JSON MUST NEVER surface through public API
- Return Value: Return
nullif key doesn't exist or is expired; returnLockInfo<C>for live locks (MUST includefence)
Rationale & Notes
Why atomic lookup script: Multi-key reads need atomicity. Without script, lock could expire between reading index and reading main key.
Why sanitize in TypeScript: Lua optimizes for atomicity, TypeScript optimizes for security. Clean separation of concerns.
Implementation Architecture
Requirements
- Backend creation:
createRedisBackend()defines commands viaredis.defineCommand() - Operations: Use defined commands (e.g.,
redis.acquireLock()) when available - Test compatibility: Falls back to
redis.eval()for mocked Redis instances - Fence Type: Redis backend uses
stringfence type to avoid JSON precision loss beyond 2^53-1
Performance Characteristics
- Sub-millisecond latency
- 25k+ ops/sec with cached scripts
- Direct key-value access provides consistently fast operations
- lookup Implementation: Required - supports both key and lockId lookup patterns
Rationale & Notes
Why defineCommand at creation: ioredis caches script SHAs globally per client. Defining at initialization ensures EVALSHA available for all subsequent calls.
Why string fence type: JavaScript numbers lose precision beyond 2^53-1. Strings preserve full 15-digit fence values without precision loss.
Configuration Options
Requirements
interface RedisBackendConfig {
keyPrefix?: string; // Default: "syncguard"
cleanupInIsLocked?: boolean; // Default: false
// ... other Redis-specific options
}
// Consistent behavior with unified tolerance
const redisBackend = createRedisBackend(); // Uses TIME_TOLERANCE_MS
// Add telemetry if needed
const observed = withTelemetry(redisBackend, {
onEvent: (e) => console.log(e),
includeRaw: false,
});Unified tolerance: See TIME_TOLERANCE_MS in interface.md for normative specification
Rationale & Notes
Why default prefix "syncguard": Namespace collision prevention. Allows multiple libraries to coexist in same Redis instance.
Why cleanupInIsLocked optional: Read-only expectation by default. Cleanup opt-in preserves predictable behavior.
Key Naming
Requirements
All key types MUST use makeStorageKey() from common utilities with backend-specific effective key budget (EFFECTIVE_KEY_BUDGET_BYTES = 1000) and 26-byte reserve:
Main lock:
baseKey = makeStorageKey(config.keyPrefix, key, 1000, 26)- Reserve: 26 bytes for derived keys (index keys)
Index key:
makeStorageKey(config.keyPrefix,id:${lockId}, 1000, 26)- Reserve: 26 bytes (same reserve for consistency)
Fence key: MUST use Two-Step Fence Key Derivation Pattern:
typescriptconst baseKey = makeStorageKey(config.keyPrefix, normalizedKey, 1000, 26); const fenceKey = makeStorageKey( config.keyPrefix, `fence:${baseKey}`, 1000, 26, );- Reserve: 26 bytes (ensures fence keys don't exceed budget when derived from base)
Default prefix:
"syncguard"Reserve bytes constant: 26
Rationale & Notes
Why two-step fence derivation: Guarantees 1:1 mapping between user keys and fence counters. See interface.md for complete rationale.
Required Lock Data Structure
Requirements
interface LockData {
lockId: string; // For ownership verification
expiresAtMs: number; // Millisecond timestamp
acquiredAtMs: number; // Millisecond timestamp
key: string; // Original user key
fence: string; // Monotonic fencing token (15-digit zero-padded string)
}Rationale & Notes
Why include key in lock data: Debugging and telemetry. Allows reconstruction of full lock state from main key alone.
Why fence as string: Preserves precision. JavaScript/Lua number precision limits don't affect string representation.
Fencing Token Implementation
Requirements
-- Fence Counter Increment Pattern (within acquire script):
local fenceKey = KEYS[3] -- Pre-constructed via two-step pattern
-- Format immediately as 15-digit string for guaranteed Lua precision safety
local fence = string.format("%015d", redis.call('INCR', fenceKey))
-- Store fence in lock data JSON and return with AcquireResultRequired Implementation Details:
- Fence Key Generation: MUST use Two-Step Fence Key Derivation Pattern from interface.md
- Atomicity:
INCRMUST be called within the same Lua script as lock acquisition - Persistence: Fence counters survive Redis restarts (no TTL on fence keys)
- Monotonicity: Each successful
acquire()increments the counter, ensuring strict ordering - Storage: Store the fence value in
LockData.fenceand return inAcquireResult.fence - Format: 15-digit zero-padded decimal strings for lexicographic ordering and JSON safety
- Overflow Enforcement (ADR-004): Backend MUST parse returned fence value and throw
LockError("Internal")if fence >FENCE_THRESHOLDS.MAX; MUST log warnings vialogFenceWarning()when fence >FENCE_THRESHOLDS.WARN. Canonical threshold values defined incommon/constants.ts.
Rationale & Notes
Why format in Lua: Guarantees precision safety. Lua's 53-bit precision accommodates 15-digit decimals without loss.
Why no TTL on fence keys: Monotonicity requires persistence. Deleting fence counter would allow reuse, violating safety guarantees.
Fence Key Lifecycle and Memory Considerations
Requirements
CRITICAL: Fence keys are intentionally persistent and MUST NOT have TTL or be deleted:
-- ❌ NEVER do this - breaks monotonicity guarantee
redis.call('DEL', fenceKey) -- Violates fencing safety
redis.call('EXPIRE', fenceKey, ttl) -- Violates fencing safetyRationale & Notes
Memory Growth: Fence counters accumulate over Redis instance lifetime. This is correct behavior for fencing safety.
Bounded key spaces: Most applications use predictable lock keys → minimal memory impact Unbounded key spaces: Applications generating unlimited unique keys → fence keys grow indefinitely Mitigation: For unbounded scenarios, consider key normalization or application-level limits
Operational Guidance:
- Monitor fence key count via
redis-cli --scan --pattern "syncguard:fence:*" | wc -l - Each fence key is ~50-100 bytes (key name + 8-byte counter)
- 1M fence keys ≈ 50-100MB memory (typically acceptable)
When to be concerned: If your application generates >10M unique lock keys annually, evaluate key design patterns.
Testing Strategy
Requirements
- Unit tests: Mock Redis with
evalmethod, no external dependencies - Integration tests: Real Redis instance, validates
defineCommand()caching - Performance tests: Benchmarks latency, throughput, and script caching benefits
- Behavioral compliance testing: Unit tests MUST verify backend imports and uses
isLive()andcalculateRedisServerTimeMs()fromcommon/time-predicates.ts - Cross-backend consistency: Integration tests MUST verify identical outcomes given same tolerance values between Redis and other backends
Rationale & Notes
Why unit tests with mocks: Fast feedback loop. No external dependencies for basic correctness checks.
Why integration tests with real Redis: Validates script caching, network behavior, actual atomicity guarantees.
Why cross-backend tests: Ensures API consistency. Users should get identical behavior regardless of backend choice (accounting for time authority differences).