Skip to content

Redis Backend Specification

This document defines Redis-specific implementation requirements that extend the common interface specification.


🚫 CRITICAL: Never Delete Fence Counters

Fence counter keys ({prefix}:fence:{key}) MUST NEVER be deleted or assigned TTL. Deleting fence counters breaks monotonicity guarantees and violates fencing safety. Cleanup operations MUST only target lock data keys (main lock and reverse index), never fence counters.


Document Structure

This specification uses a normative vs rationale pattern:

  • Requirements sections contain MUST/SHOULD/MAY/NEVER statements defining the contract
  • Rationale & Notes sections provide background, design decisions, and operational guidance

Dual-Key Storage Pattern

Requirements

Main lock key: {keyPrefix}:{key} stores JSON lock data Index key: {keyPrefix}:id:{lockId} stores full lockKey for reverse lookup (ADR-013)

  • Both keys MUST have identical TTL for consistency
  • Use redis.call('SET', key, data, 'PX', ttlMs) for atomic set-with-TTL
  • Storage Key Generation: MUST call makeStorageKey() from common utilities (see Storage Key Generation)
  • Backend-specific limit: EFFECTIVE_KEY_BUDGET_BYTES = 1000 (chosen for predictable memory/ops headroom; not a Redis hard limit)
  • Reserve Bytes Requirement: Redis operations MUST reserve 26 bytes for derived keys when calling makeStorageKey():
    • Formula: ":id:" (4 bytes) + lockId (22 bytes) = 26 bytes
    • Purpose: Ensures derived keys like ${prefix}:id:${lockId} fit within the effective budget

Rationale & Notes

Why dual-key pattern: Enables fast reverse lookup (lockId → key) required for release/extend operations. Single-key approaches would require scanning or secondary indexes.

Why identical TTL: Prevents orphaned index keys. If index outlives lock, lookup returns stale data. If lock outlives index, release/extend fail incorrectly.

Why 26-byte reserve: Redis constructs multiple keys from base (main lock + reverse index). Reserve ensures all derived keys fit within the effective budget when base key is at maximum length.


Script Caching for Performance

Requirements

  • Primary approach: Use ioredis defineCommand() for automatic EVALSHA caching
  • Fallback: Graceful fallback to redis.eval() for test mocks
  • Scripts defined once during backend initialization for optimal performance
  • ioredis handles SCRIPT LOAD + EVALSHA + NOSCRIPT error recovery automatically

Rationale & Notes

Why defineCommand: ioredis manages the entire caching lifecycle automatically. First call loads script via SCRIPT LOAD, subsequent calls use EVALSHA. If Redis restarts and loses scripts, ioredis automatically reloads via NOSCRIPT error handling.

Why fallback to eval: Test mocks often don't implement full Redis command set. Fallback enables unit testing without external dependencies.

Performance impact: EVALSHA reduces network overhead from ~1KB (full script) to ~40 bytes (SHA hash). At 10K ops/sec, saves ~9.6MB/sec network bandwidth.


Lua Scripts for Atomicity

Requirements

  • ALL mutating operations MUST use Lua scripts
  • Scripts centralized in redis/scripts.ts with descriptive comments
  • Use cjson.decode()/cjson.encode() for JSON handling in Lua
  • Script return codes MUST follow operation-specific semantics (see Script Return Code Semantics for complete mapping)
  • Scripts handle lock contention via return codes, backend throws LockError for Redis errors
  • Pass all required data as KEYS and ARGV to avoid closure issues

Rationale & Notes

Why Lua scripts: Redis Lua scripts execute atomically. Provides ACID guarantees without complex client-side transaction logic.

Why centralized scripts: Single source of truth. Prevents script duplication and drift across operations.

Why KEYS/ARGV pattern: Lua closures over external variables can cause subtle bugs. Explicit parameters make data flow visible and testable.


Explicit Ownership Verification (ADR-003)

Requirements

ALL Redis scripts MUST include explicit ownership verification:

lua
-- After loading lock data
if data.lockId ~= lockId then return 0 end  -- Ownership mismatch

This verification is MANDATORY even when using atomic scripts.

Rationale & Notes

Why required despite atomicity: Defense-in-depth. While atomic scripts prevent most race conditions, explicit verification guards against:

  • Stale reverse mappings: Cleanup race conditions where index key exists but points to wrong lock
  • Cross-backend consistency: Both Redis and Firestore must implement identical ownership checking
  • Security requirement: Prevents wrong-lock mutations in all scenarios (ADR-003 compliance)

Edge case example: Index key survives TTL cleanup due to timing window. Without explicit verification, release could affect unrelated lock that reused the same key.


Time Authority & Liveness Predicate

Requirements

MUST use unified liveness predicate from common/time-predicates.ts:

typescript
import {
  isLive,
  calculateRedisServerTimeMs,
  TIME_TOLERANCE_MS,
} from "../common/time-predicates.js";

const serverTimeMs = calculateRedisServerTimeMs(await redis.call("TIME"));
const live = isLive(storedExpiresAtMs, serverTimeMs, TIME_TOLERANCE_MS);

Time Authority Model: Redis uses server time via redis.call('TIME') (ADR-005).

Rationale & Notes

Why server time: All lock operations use a single authoritative time source, eliminating client clock skew issues.

Server Time Reliability:

  • Single source of truth: All clients query the same Redis server time for consistency
  • No NTP requirements: Client clock accuracy is irrelevant for lock operations
  • Predictable behavior: Lock liveness checks are deterministic across all clients
  • High consistency: Eliminates race conditions caused by multi-client clock skew (unlike Firestore's client-time model)

Unified Tolerance: See TIME_TOLERANCE_MS in interface.md for normative tolerance specification.

Operational Considerations: See Time Authority Tradeoffs for:

  • When to choose Redis vs Firestore based on time authority requirements
  • Pre-production checklists and production monitoring guidance
  • Failure scenarios and mitigation strategies for server time authority
  • When Redis server time might fail (e.g., Redis cluster clock sync issues)

Backend Capabilities and Type Safety

Requirements

Redis backends MUST declare their specific capabilities for enhanced type safety:

typescript
interface RedisCapabilities extends BackendCapabilities {
  backend: "redis"; // Backend type discriminant
  supportsFencing: true; // Redis always provides fencing tokens
  timeAuthority: "server"; // Uses Redis server time
}

const redisBackend: LockBackend<RedisCapabilities> = createRedisBackend(config);

Rationale & Notes

Ergonomic Usage: Since Redis always supports fencing, TypeScript provides compile-time guarantees:

typescript
const result = await redisBackend.acquire({ key: "resource", ttlMs: 30000 });
if (result.ok) {
  result.fence; // No assertion needed - TypeScript knows this exists
}

Type discriminant benefits: Enables pattern matching and type-safe backend switching in generic code.


Script Implementation Patterns

NORMATIVE IMPLEMENTATION: See redis/scripts.ts for canonical Lua scripts with inline documentation.

Script Characteristics

Scripts implement these atomic operation patterns:

  • Acquire: Expiry check → fence increment → dual-key write with identical TTL
  • Release/Extend: Reverse lookup (ADR-013) → ownership verification (ADR-003) → mutation
  • Lookup: Atomic multi-key read with ownership verification

All scripts use:

  • Canonical time calculation: time[1] * 1000 + math.floor(time[2] / 1000) (see calculateRedisServerTimeMs())
  • Unified liveness predicate: isLive() pattern from common/time-predicates.ts
  • 15-digit fence formatting: string.format("%015d", redis.call('INCR', fenceKey)) for Lua precision safety

Script Return Code Semantics

Backend operations use these standardized return codes for internal condition tracking:

Acquire:

  • 0 → contention (lock held by another process)
  • [1, fence, expiresAtMs] → success with fence token and authoritative server-time expiry

Release:

  • 1 → success
  • 0 → ownership mismatch (ADR-003)
  • -1 → not found
  • -2 → expired

Extend:

  • [1, newExpiresAtMs] → success with authoritative server-time expiry
  • 0 → ownership mismatch (ADR-003)
  • -1 → not found
  • -2 → expired

Rationale & Notes

Why centralized in redis/scripts.ts: Single source of truth prevents script duplication and drift. See module for KEYS/ARGV signatures and complete implementation details.

Why return codes: Enables cheap internal condition tracking for telemetry without additional I/O. Public API simplifies to { ok: boolean } for release/extend operations.


Error Handling

Requirements

MUST follow common spec ErrorMappingStandard.

Key Redis mappings:

  • ServiceUnavailable: Connection errors (ECONNRESET, ENOTFOUND, ECONNREFUSED)
  • AuthFailed: NOAUTH, WRONGPASS, NOPERM
  • InvalidArgument: WRONGTYPE, SYNTAX, INVALID
  • NetworkTimeout: Client/operation timeouts
  • Aborted: Operation cancelled via AbortSignal

Implementation Pattern:

typescript
import {
  isLive,
  calculateRedisServerTimeMs,
} from "../common/time-predicates.js";

// Release/extend operations use script return codes
const scriptReturnCode = await redis.evalsha(/* script */);

// Public API: simplified boolean result
const success = scriptReturnCode === 1;

// Internal detail tracking (best-effort, for decorator consumption if telemetry enabled)
const detail = !success
  ? scriptReturnCode === -2
    ? "expired"
    : "not-found"
  : undefined;

return { ok: success };

Release/Extend Script Return Codes:

  • 1 → success
  • 0 → ownership mismatch (ADR-003) → internal: "not-found"
  • -1 → never existed/cleaned up → internal: "not-found"
  • -2 → deterministically observed expired → internal: "expired"

Rationale & Notes

Why return codes instead of error strings: More efficient. Numbers are cheaper to parse than strings in Lua/JSON.

Why track internal details: Enables rich telemetry when decorator is enabled, without cluttering public API.


TTL Management

Requirements

  • Use milliseconds directly with PX: ttlMs
  • Use Redis PX option, not separate EXPIRE calls
  • Cleanup Configuration: Optional cleanupInIsLocked: boolean (default: false) - when enabled, allows fire-and-forget cleanup in isLocked operation
    • CRITICAL: Cleanup MUST ONLY delete lock data keys (main lock key and reverse index key), NEVER fence counter keys
    • Configuration Validation: Backend MUST validate at initialization that keyPrefix configuration does not create overlap between lock data and fence counter namespaces
    • If misconfiguration could result in fence counter deletion, backend MUST throw LockError("InvalidArgument") with descriptive message

Rationale & Notes

Why PX not EXPIRE: Single atomic operation. SET key value PX ttl is atomic; SET then EXPIRE creates race window.

Why validate fence counter namespace: Prevents catastrophic bugs where cleanup accidentally deletes fence counters, breaking monotonicity guarantees.


Operation-Specific Behavior

Acquire Operation Requirements

Direct script return mapping (not semantic helper):

Script ReturnBackend Result
0{ ok: false, reason: "locked" } (contention)
[1, fence, expiresAtMs]{ ok: true, lockId, expiresAtMs, fence } (success)
  • MUST return authoritative expiresAtMs: Computed from Redis server time authority (redis.call('TIME')) to ensure consistency and accurate heartbeat scheduling. No client-side approximation allowed (see ADR-010).
  • System Errors: Backend throws LockError for Redis connection/command failures
  • Single-attempt operations: Redis backends perform single attempts only; retry logic is handled by the lock() helper

Release Operation Requirements

  • LockId Validation: MUST call validateLockId(lockId) and throw LockError("InvalidArgument") on malformed input
  • MUST implement TOCTOU Protection via atomic Lua scripts. Return simplified { ok: boolean } results. Track internal details cheaply when available for potential telemetry decorator consumption.

Extend Operation Requirements

  • LockId Validation: MUST call validateLockId(lockId) and throw LockError("InvalidArgument") on malformed input
  • MUST return authoritative expiresAtMs: Computed from Redis server time authority (redis.call('TIME')) to ensure consistency and accurate heartbeat scheduling. No client-side approximation allowed (see ADR-010).
  • MUST implement TOCTOU Protection via atomic Lua scripts. TTL semantics: replaces remaining TTL entirely (now + ttlMs).

IsLocked Operation Requirements

  • Use Case: Simple boolean checks (prefer lookup() for diagnostics)
  • Locked/Unlocked: Backend returns true/false based on key existence and expiry
  • Read-Only by Default: Cleanup disabled by default to maintain pure read semantics
  • Optional Cleanup: When cleanupInIsLocked: true configured, MAY perform fire-and-forget cleanup following common spec guidelines
  • System Errors: Backend throws LockError for Redis failures

Lookup Operation Requirements

Runtime Validation: MUST validate inputs before any I/O operations:

  • Key mode: Call normalizeAndValidateKey(key) and fail fast on invalid keys
  • LockId mode: Call validateLockId(lockId) and throw LockError("InvalidArgument") on malformed input

Key Lookup Mode:

  • Implementation: Direct access to main lock key: redis.call('GET', lockKey)
  • Complexity: O(1) direct access (single operation)
  • Atomicity: Single GET operation (inherently atomic)
  • Performance: Direct key-value access, sub-millisecond latency

LockId Lookup Mode:

  • Implementation: MUST use atomic Lua script that reads reverse index and main lock in single operation
  • Complexity: Multi-step (reverse mapping + verification)
  • Atomicity: MUST be atomic via Lua script to prevent TOCTOU races
  • Performance: Atomic script execution, consistent sub-millisecond performance

Common Requirements:

  • Ownership Verification: Script MUST verify data.lockId === lockId after parsing lock data
  • Expiry Check: Parse JSON and apply isLive() predicate using calculateRedisServerTimeMs() and TIME_TOLERANCE_MS
  • Data Transformation Requirement: Lua script returns full JSON lockData. TypeScript lookup method MUST parse, compute keyHash and lockIdHash using hashKey(), and return sanitized LockInfo<C>
  • ⚠️ FORBIDDEN: Raw Data Pass-Through: TypeScript layer MUST sanitize all data before returning. Raw Lua JSON MUST NEVER surface through public API
  • Return Value: Return null if key doesn't exist or is expired; return LockInfo<C> for live locks (MUST include fence)

Rationale & Notes

Why atomic lookup script: Multi-key reads need atomicity. Without script, lock could expire between reading index and reading main key.

Why sanitize in TypeScript: Lua optimizes for atomicity, TypeScript optimizes for security. Clean separation of concerns.


Implementation Architecture

Requirements

  • Backend creation: createRedisBackend() defines commands via redis.defineCommand()
  • Operations: Use defined commands (e.g., redis.acquireLock()) when available
  • Test compatibility: Falls back to redis.eval() for mocked Redis instances
  • Fence Type: Redis backend uses string fence type to avoid JSON precision loss beyond 2^53-1

Performance Characteristics

  • Sub-millisecond latency
  • 25k+ ops/sec with cached scripts
  • Direct key-value access provides consistently fast operations
  • lookup Implementation: Required - supports both key and lockId lookup patterns

Rationale & Notes

Why defineCommand at creation: ioredis caches script SHAs globally per client. Defining at initialization ensures EVALSHA available for all subsequent calls.

Why string fence type: JavaScript numbers lose precision beyond 2^53-1. Strings preserve full 15-digit fence values without precision loss.


Configuration Options

Requirements

typescript
interface RedisBackendConfig {
  keyPrefix?: string; // Default: "syncguard"
  cleanupInIsLocked?: boolean; // Default: false
  // ... other Redis-specific options
}

// Consistent behavior with unified tolerance
const redisBackend = createRedisBackend(); // Uses TIME_TOLERANCE_MS

// Add telemetry if needed
const observed = withTelemetry(redisBackend, {
  onEvent: (e) => console.log(e),
  includeRaw: false,
});

Unified tolerance: See TIME_TOLERANCE_MS in interface.md for normative specification

Rationale & Notes

Why default prefix "syncguard": Namespace collision prevention. Allows multiple libraries to coexist in same Redis instance.

Why cleanupInIsLocked optional: Read-only expectation by default. Cleanup opt-in preserves predictable behavior.


Key Naming

Requirements

All key types MUST use makeStorageKey() from common utilities with backend-specific effective key budget (EFFECTIVE_KEY_BUDGET_BYTES = 1000) and 26-byte reserve:

  • Main lock: baseKey = makeStorageKey(config.keyPrefix, key, 1000, 26)

    • Reserve: 26 bytes for derived keys (index keys)
  • Index key: makeStorageKey(config.keyPrefix, id:${lockId}, 1000, 26)

    • Reserve: 26 bytes (same reserve for consistency)
  • Fence key: MUST use Two-Step Fence Key Derivation Pattern:

    typescript
    const baseKey = makeStorageKey(config.keyPrefix, normalizedKey, 1000, 26);
    const fenceKey = makeStorageKey(
      config.keyPrefix,
      `fence:${baseKey}`,
      1000,
      26,
    );
    • Reserve: 26 bytes (ensures fence keys don't exceed budget when derived from base)
  • Default prefix: "syncguard"

  • Reserve bytes constant: 26

Rationale & Notes

Why two-step fence derivation: Guarantees 1:1 mapping between user keys and fence counters. See interface.md for complete rationale.


Required Lock Data Structure

Requirements

typescript
interface LockData {
  lockId: string; // For ownership verification
  expiresAtMs: number; // Millisecond timestamp
  acquiredAtMs: number; // Millisecond timestamp
  key: string; // Original user key
  fence: string; // Monotonic fencing token (15-digit zero-padded string)
}

Rationale & Notes

Why include key in lock data: Debugging and telemetry. Allows reconstruction of full lock state from main key alone.

Why fence as string: Preserves precision. JavaScript/Lua number precision limits don't affect string representation.


Fencing Token Implementation

Requirements

lua
-- Fence Counter Increment Pattern (within acquire script):
local fenceKey = KEYS[3]  -- Pre-constructed via two-step pattern
-- Format immediately as 15-digit string for guaranteed Lua precision safety
local fence = string.format("%015d", redis.call('INCR', fenceKey))
-- Store fence in lock data JSON and return with AcquireResult

Required Implementation Details:

  • Fence Key Generation: MUST use Two-Step Fence Key Derivation Pattern from interface.md
  • Atomicity: INCR MUST be called within the same Lua script as lock acquisition
  • Persistence: Fence counters survive Redis restarts (no TTL on fence keys)
  • Monotonicity: Each successful acquire() increments the counter, ensuring strict ordering
  • Storage: Store the fence value in LockData.fence and return in AcquireResult.fence
  • Format: 15-digit zero-padded decimal strings for lexicographic ordering and JSON safety
  • Overflow Enforcement (ADR-004): Backend MUST parse returned fence value and throw LockError("Internal") if fence > FENCE_THRESHOLDS.MAX; MUST log warnings via logFenceWarning() when fence > FENCE_THRESHOLDS.WARN. Canonical threshold values defined in common/constants.ts.

Rationale & Notes

Why format in Lua: Guarantees precision safety. Lua's 53-bit precision accommodates 15-digit decimals without loss.

Why no TTL on fence keys: Monotonicity requires persistence. Deleting fence counter would allow reuse, violating safety guarantees.


Fence Key Lifecycle and Memory Considerations

Requirements

CRITICAL: Fence keys are intentionally persistent and MUST NOT have TTL or be deleted:

lua
-- ❌ NEVER do this - breaks monotonicity guarantee
redis.call('DEL', fenceKey)  -- Violates fencing safety
redis.call('EXPIRE', fenceKey, ttl)  -- Violates fencing safety

Rationale & Notes

Memory Growth: Fence counters accumulate over Redis instance lifetime. This is correct behavior for fencing safety.

Bounded key spaces: Most applications use predictable lock keys → minimal memory impact Unbounded key spaces: Applications generating unlimited unique keys → fence keys grow indefinitely Mitigation: For unbounded scenarios, consider key normalization or application-level limits

Operational Guidance:

  • Monitor fence key count via redis-cli --scan --pattern "syncguard:fence:*" | wc -l
  • Each fence key is ~50-100 bytes (key name + 8-byte counter)
  • 1M fence keys ≈ 50-100MB memory (typically acceptable)

When to be concerned: If your application generates >10M unique lock keys annually, evaluate key design patterns.


Testing Strategy

Requirements

  • Unit tests: Mock Redis with eval method, no external dependencies
  • Integration tests: Real Redis instance, validates defineCommand() caching
  • Performance tests: Benchmarks latency, throughput, and script caching benefits
  • Behavioral compliance testing: Unit tests MUST verify backend imports and uses isLive() and calculateRedisServerTimeMs() from common/time-predicates.ts
  • Cross-backend consistency: Integration tests MUST verify identical outcomes given same tolerance values between Redis and other backends

Rationale & Notes

Why unit tests with mocks: Fast feedback loop. No external dependencies for basic correctness checks.

Why integration tests with real Redis: Validates script caching, network behavior, actual atomicity guarantees.

Why cross-backend tests: Ensures API consistency. Users should get identical behavior regardless of backend choice (accounting for time authority differences).