Core Concepts β
Understand how distributed locks work in SyncGuard.
Architecture & Design Decisions
Curious about why things work this way? See specs/adrs.md for architectural decision records explaining the rationale behind key design choices.
Lock Lifecycle β
Every lock follows a three-phase lifecycle:
- Acquire β Request exclusive access with a unique key
- Execute β Run your critical section while holding the lock
- Release β Free the lock for others (or let TTL expire)
// Automatic lifecycle management
await lock(
async () => {
// Execute phase (critical section)
},
{ key: "resource:123" },
);
// Manual lifecycle control
const result = await backend.acquire({ key: "resource:123", ttlMs: 30000 });
if (result.ok) {
try {
// Execute phase
} finally {
await backend.release({ lockId: result.lockId });
}
}
Contention Handling
If another process holds the lock, acquisition returns { ok: false, reason: "locked" }
(manual mode) or retries automatically (auto mode).
Crash Safety
Locks expire via TTL even if your process crashes. No manual cleanup required.
Lock Keys β
Keys identify the resource being locked. Choose keys that uniquely represent your protected operation:
Good keys β
`payment:${paymentId}` // Unique per payment
`job:daily-report:${date}` // Unique per day
`deploy:${environment}` // One deploy at a time per env
`webhook:${eventId}`; // Idempotent webhook handling
Bad keys β
"lock" // Too generic, serializes everything
`user:${userId}`; // Too broad, blocks all user operations
Math.random().toString(); // Random keys defeat the purpose
Key constraints:
- Maximum 512 bytes after UTF-8 encoding
- Automatically normalized to NFC form
- Hashed when prefixed keys exceed backend limits (transparent to you)
Namespacing
Backend prefixes prevent cross-app collisions:
- Redis:
"syncguard:payment:123"
- Firestore: document ID in
"locks"
collection
Ownership & Lock IDs β
Every lock gets a unique lockId
(22-character base64url string, 128 bits of entropy). Only the owner can release or extend the lock.
const result = await backend.acquire({ key: "resource:123", ttlMs: 30000 });
if (result.ok) {
const { lockId } = result;
// Only this lockId can release/extend this specific lock
await backend.extend({ lockId, ttlMs: 30000 });
await backend.release({ lockId });
}
Why lock IDs matter:
- Idempotency β Same
lockId
can't accidentally release someone else's lock - Concurrent safety β Two processes acquiring locks on the same key get different lock IDs
- Explicit ownership β Operations require proof of ownership via
lockId
Checking ownership β Use the helper functions for better discoverability:
import { owns, getById } from "syncguard";
// Quick boolean check - simple and clear
if (await owns(backend, lockId)) {
console.log("Still own the lock");
}
// Detailed info - includes expiration and fence tokens
const info = await getById(backend, lockId);
if (info) {
console.log(`Expires in ${info.expiresAtMs - Date.now()}ms`);
console.log(`Fence token: ${info.fence}`);
}
TTL & Expiration β
Locks expire automatically after ttlMs
milliseconds. This prevents orphaned locks when processes crash.
Choosing TTL:
// Short critical sections (default: 30s)
await lock(workFn, { key: "quick-task", ttlMs: 30000 });
// Long-running batch jobs
await lock(workFn, { key: "daily-report", ttlMs: 300000 }); // 5 minutes
Guidelines:
- Too short β οΈ β Lock expires mid-work β potential race conditions
- Too long β οΈ β Crashed processes block others for longer
- Sweet spot β β 2-3x your expected work duration
Extending locks (for work that takes longer than expected):
const result = await backend.acquire({ key: "batch:report", ttlMs: 60000 });
if (result.ok) {
try {
await processFirstBatch();
// Extend lock before it expires
const extended = await backend.extend({
lockId: result.lockId,
ttlMs: 60000, // Reset to 60s from now
});
if (!extended.ok) {
throw new Error("Lost lock ownership");
}
await processSecondBatch();
} finally {
await backend.release({ lockId: result.lockId });
}
}
TTL Replacement Behavior
extend()
replaces the TTL entirelyβit doesn't add to the remaining time. Extending with ttlMs: 60000
resets expiration to 60 seconds from now, not from the original acquisition.
Heartbeat pattern (for very long-running work):
const result = await backend.acquire({ key: "long-task", ttlMs: 60000 });
if (!result.ok) throw new Error("Failed to acquire lock");
const { lockId } = result;
// Extend every 30s (half the TTL)
const heartbeat = setInterval(async () => {
const extended = await backend.extend({ lockId, ttlMs: 60000 });
if (!extended.ok) {
clearInterval(heartbeat);
throw new Error("Lost lock ownership");
}
}, 30000);
try {
await doLongRunningWork();
} finally {
clearInterval(heartbeat);
await backend.release({ lockId });
}
Retry Strategy β
When locks are contended, the lock()
helper retries automatically using exponential backoff with jitter.
Default retry behavior:
await lock(workFn, {
key: "resource:123",
maxRetries: 10, // Try up to 10 times (default)
retryDelayMs: 100, // Start with 100ms delay (default)
timeoutMs: 5000, // Give up after 5s total (default)
});
How it works:
- First attempt fails β wait 100ms
- Second attempt fails β wait ~200ms (Β± jitter)
- Third attempt fails β wait ~400ms (Β± jitter)
- Continue doubling until success or timeout
Jitter Prevents Thundering Herd
Jitter (50% randomization) prevents all processes from retrying simultaneously:
- Without jitter: 10 processes retry at exactly 100ms, 200ms, 400ms...
- With jitter: processes spread out between 50-150ms, 100-300ms, 200-600ms...
Custom retry strategies:
// More patient (higher contention tolerance)
await lock(workFn, {
key: "hot-resource",
maxRetries: 20,
timeoutMs: 10000,
});
// Less patient (fail fast)
await lock(workFn, {
key: "quick-check",
maxRetries: 3,
timeoutMs: 1000,
});
// No retries (single attempt)
const result = await backend.acquire({ key: "resource:123", ttlMs: 30000 });
if (!result.ok) {
// Handle contention immediately
}
When acquisition fails:
try {
await lock(workFn, { key: "resource:123" });
} catch (error) {
if (error instanceof LockError && error.code === "AcquisitionTimeout") {
// Exceeded timeoutMs after all retries
console.log("Resource too contended, try again later");
}
}
Ownership Checking β
Diagnostic Use Only
Ownership checks are for diagnostics, UI, and monitoring β NOT correctness guards. Never use check β mutate
patterns. Correctness relies on atomic ownership verification built into release()
and extend()
operations.
Recommended approach β Use the helper functions for clarity and discoverability:
Check if a resource is locked:
import { getByKey } from "syncguard";
const info = await getByKey(backend, "resource:123");
if (info) {
console.log(`Locked until ${new Date(info.expiresAtMs)}`);
console.log(`Fence token: ${info.fence}`);
} else {
console.log("Resource is available");
}
Check if you still own a lock:
import { owns, getById } from "syncguard";
// Simple boolean check
const stillOwned = await owns(backend, lockId);
if (!stillOwned) {
throw new Error("Lost lock ownership");
}
// Or get detailed information
const info = await getById(backend, lockId);
if (info) {
console.log(
`Still own the lock, expires in ${info.expiresAtMs - Date.now()}ms`,
);
}
Helper Functions vs Direct Method
The helpers (getByKey
, getById
, owns
) provide better discoverability and clearer intent than calling backend.lookup()
directly. They're the recommended approach for lock diagnostics. For advanced cases, you can still use backend.lookup({ key })
or backend.lookup({ lockId })
directly.
Security Note
Helpers return sanitized data with hashed keys/lockIds by default. For debugging with raw values, use getByKeyRaw()
or getByIdRaw()
helpers.
When to use ownership checking:
- β Diagnostics: "Why is this resource locked?"
- β Monitoring: Track lock expiration times
- β Conditional logic: "Should I wait or skip?"
- β UI display: Show lock status to users
When NOT to use ownership checking:
- β Pre-checking before
extend()
orrelease()
(operations are idempotent) - β Gating mutations (use fencing tokens instead, see Fencing Tokens)
- β Polling for lock availability (use retry logic in
lock()
instead)