Production-Ready Circuit Breakers with Redis and Node.js
Production-Ready Circuit Breakers with Redis and Node.js
In Building Circuit Breaker Pattern in Node.js, I showed how to implement an in-memory circuit breaker. It works great for single-server applications: when a third-party API starts failing, the circuit breaker trips, stops hammering the failing service, and gives it time to recover.
But what happens when you scale to multiple servers? Each server has its own circuit breaker, tracking its own failure counts. Server A's circuit might be open (protecting against failures) while Server B's is still closed (still hitting the failing API). The circuit breaker's job is to detect system-wide problems and react accordingly. In-memory state breaks this when you have multiple processes.
Here's how to solve it with Redis-backed distributed circuit breakers.
When In-Memory Circuit Breakers Aren't Enough
The in-memory implementation from Part 1 stored state in a JavaScript Map:
// Works for single-server apps
const circuitState = new Map<string, { failures: number; state: 'CLOSED' | 'OPEN' | 'HALF_OPEN' }>();
This breaks in multi-server deployments because:
-
No shared state: Each server tracks failures independently. A service might be completely down, but if failures are spread across 10 servers, no individual circuit trips.
-
Wasted retries: When the circuit opens on Server A, Server B doesn't know and keeps hammering the failing service. You're still sending 90% of the requests you wanted to block.
-
Inconsistent user experience: Some users hit servers with open circuits (fast failures) while others hit servers with closed circuits (slow timeouts).
The solution: move circuit state to Redis. Now all servers share the same view of each service's health.
Distributed Circuit Breaker Architecture
A Redis-backed circuit breaker uses Redis as a shared state store. When any server detects failures, all servers immediately know about it. When a circuit opens on one server, it opens everywhere.
State transitions: The circuit breaker has three states, just like Part 1:
- CLOSED: Normal operation, requests flow through
- OPEN: Failure threshold exceeded, all requests fail fast
- HALF_OPEN: Testing if the service has recovered
Redis data structure:
circuit:{service_name}:state → "CLOSED" | "OPEN" | "HALF_OPEN"
circuit:{service_name}:failures → Integer (failure count)
circuit:{service_name}:opened_at → Timestamp (when circuit opened)
circuit:{service_name}:test_lock → Lock for HALF_OPEN testing
When any server increments the failure count past the threshold, it sets the state to OPEN. All other servers immediately see this when they check circuit state before making requests.
Implementation: Redis-Backed Circuit Breaker
Here's the production implementation I use on NeedThisDone.com and client projects:
// lib/circuit-breaker-redis.ts
import Redis from 'ioredis';
const redis = new Redis(process.env.REDIS_URL!);
interface CircuitBreakerOptions {
failureThreshold: number; // Number of failures before opening
successThreshold: number; // Successes needed to close from half-open
timeout: number; // How long to stay open (ms)
}
export class RedisCircuitBreaker {
constructor(
private serviceName: string,
private options: CircuitBreakerOptions = {
failureThreshold: 5,
successThreshold: 2,
timeout: 60000, // 1 minute
}
) {}
private keys = {
state: `circuit:${this.serviceName}:state`,
failures: `circuit:${this.serviceName}:failures`,
openedAt: `circuit:${this.serviceName}:opened_at`,
testLock: `circuit:${this.serviceName}:test_lock`,
};
async execute<T>(operation: () => Promise<T>): Promise<T> {
// Check circuit state
const state = await this.getState();
if (state === 'OPEN') {
// Check if timeout has passed
const openedAt = await redis.get(this.keys.openedAt);
if (openedAt && Date.now() - parseInt(openedAt) > this.options.timeout) {
await this.transitionToHalfOpen();
} else {
throw new Error(`Circuit breaker OPEN for ${this.serviceName}`);
}
}
if (state === 'HALF_OPEN') {
// Only one request at a time in half-open state
const acquired = await this.acquireTestLock();
if (!acquired) {
throw new Error(`Circuit breaker testing, request rejected for ${this.serviceName}`);
}
}
try {
const result = await operation();
await this.onSuccess();
return result;
} catch (error) {
await this.onFailure();
throw error;
}
}
private async getState(): Promise<'CLOSED' | 'OPEN' | 'HALF_OPEN'> {
const state = await redis.get(this.keys.state);
return (state as 'CLOSED' | 'OPEN' | 'HALF_OPEN') || 'CLOSED';
}
private async transitionToHalfOpen(): Promise<void> {
// Use Lua script for atomic state transition
await redis.eval(
`
if redis.call("get", KEYS[1]) == "OPEN" then
redis.call("set", KEYS[1], "HALF_OPEN")
redis.call("del", KEYS[2])
return 1
end
return 0
`,
2,
this.keys.state,
this.keys.testLock
);
}
private async acquireTestLock(): Promise<boolean> {
const result = await redis.set(this.keys.testLock, '1', 'EX', 10, 'NX');
return result === 'OK';
}
private async onSuccess(): Promise<void> {
const state = await this.getState();
if (state === 'HALF_OPEN') {
// Check if we've hit success threshold
const successes = await redis.incr(`${this.keys.state}:successes`);
if (successes >= this.options.successThreshold) {
// Close circuit - use Lua for atomicity
await redis.eval(
`
redis.call("set", KEYS[1], "CLOSED")
redis.call("del", KEYS[2], KEYS[3], KEYS[4])
return 1
`,
4,
this.keys.state,
this.keys.failures,
this.keys.openedAt,
`${this.keys.state}:successes`
);
}
// Release test lock
await redis.del(this.keys.testLock);
}
}
private async onFailure(): Promise<void> {
const state = await this.getState();
if (state === 'HALF_OPEN') {
// Failure in half-open state reopens circuit
await redis.eval(
`
redis.call("set", KEYS[1], "OPEN")
redis.call("set", KEYS[2], ARGV[1])
redis.call("del", KEYS[3])
return 1
`,
3,
this.keys.state,
this.keys.openedAt,
this.keys.testLock,
Date.now().toString()
);
return;
}
// Increment failure count
const failures = await redis.incr(this.keys.failures);
if (failures >= this.options.failureThreshold) {
// Open circuit
await redis.eval(
`
redis.call("set", KEYS[1], "OPEN")
redis.call("set", KEYS[2], ARGV[1])
return 1
`,
2,
this.keys.state,
this.keys.openedAt,
Date.now().toString()
);
}
}
}
Usage:
// Wrap any external API call
import { RedisCircuitBreaker } from '@/lib/circuit-breaker-redis';
const medusaBreaker = new RedisCircuitBreaker('medusa-api', {
failureThreshold: 5,
successThreshold: 2,
timeout: 60000,
});
async function fetchProducts() {
return medusaBreaker.execute(async () => {
const response = await fetch(`${process.env.MEDUSA_URL}/store/products`);
if (!response.ok) throw new Error('Medusa API error');
return response.json();
});
}
Why Lua scripts?: Redis Lua scripts execute atomically. The state transition from OPEN to HALF_OPEN involves checking state, updating it, and deleting keys—all of which must happen together. Without Lua, another server could read stale state mid-transition.
Monitoring and Alerting
Production circuit breakers need observability. You want to know when circuits open, how often they're tripping, and what's causing failures.
Prometheus-style metrics:
// lib/circuit-breaker-metrics.ts
export class CircuitBreakerMetrics {
private counters = {
successes: 0,
failures: 0,
rejections: 0, // Requests blocked by open circuit
};
recordSuccess(serviceName: string) {
this.counters.successes++;
// Export to Prometheus/Datadog/CloudWatch
}
recordFailure(serviceName: string) {
this.counters.failures++;
}
recordRejection(serviceName: string) {
this.counters.rejections++;
}
async getMetrics() {
return {
...this.counters,
errorRate: this.counters.failures / (this.counters.successes + this.counters.failures),
};
}
}
Add metrics to circuit breaker:
async execute<T>(operation: () => Promise<T>): Promise<T> {
const state = await this.getState();
if (state === 'OPEN') {
metrics.recordRejection(this.serviceName);
throw new Error(`Circuit breaker OPEN for ${this.serviceName}`);
}
try {
const result = await operation();
metrics.recordSuccess(this.serviceName);
return result;
} catch (error) {
metrics.recordFailure(this.serviceName);
throw error;
}
}
Alerting strategy: Set up alerts when:
- Circuit opens (immediate notification—something is down)
- Error rate exceeds 10% for 5 minutes (service is degraded)
- Circuit flaps (opens/closes repeatedly—unstable service)
I use Discord webhooks for circuit breaker alerts. When a circuit opens, I get notified immediately and can investigate before customers complain.
Graceful Degradation Strategies
When a circuit opens, fail fast—but fail gracefully. Don't just throw errors at users. Give them a degraded but functional experience.
Strategy 1: Cached fallbacks
async function fetchProducts() {
try {
return await medusaBreaker.execute(fetchFromMedusa);
} catch (error) {
console.warn('Circuit open, using cached products');
return await getCachedProducts();
}
}
Strategy 2: Feature flags
async function getRecommendations(userId: string) {
const recommendationsEnabled = await redis.get('feature:recommendations');
if (!recommendationsEnabled || !(await isRecommendationServiceHealthy())) {
// Circuit is open or feature disabled—skip recommendations
return [];
}
return recommendationBreaker.execute(() => fetchRecommendations(userId));
}
Strategy 3: Default responses
async function getProductReviews(productId: string) {
try {
return await reviewServiceBreaker.execute(() => fetchReviews(productId));
} catch (error) {
// Circuit open—return empty state instead of breaking the page
return { reviews: [], averageRating: null, count: 0 };
}
}
The goal: when a circuit opens, users should notice minimal disruption. Maybe they don't see personalized recommendations, but the rest of the app works fine.
Production Checklist
Before deploying Redis-backed circuit breakers to production, verify:
1. Redis High Availability: Use Redis Sentinel or Redis Cluster. If Redis goes down, your circuit breakers stop working. Sentinel provides automatic failover when the primary Redis instance fails.
2. Connection Pooling: Reuse Redis connections. Creating a new connection for every request adds latency. Use ioredis with connection pooling:
const redis = new Redis({
host: process.env.REDIS_HOST,
port: parseInt(process.env.REDIS_PORT || '6379'),
maxRetriesPerRequest: 3,
enableReadyCheck: true,
lazyConnect: true,
});
3. Timeout Tuning: Set appropriate circuit breaker timeouts for each service. Critical services might get 30-second timeouts (recover quickly). Non-critical services might get 5-minute timeouts (don't waste resources retrying).
4. Key Expiration: Set TTLs on Redis keys to prevent stale circuit state from persisting forever:
await redis.set(this.keys.state, 'OPEN', 'EX', 3600); // Expire after 1 hour
5. Logging: Log every state transition. When circuits open in production, you need context:
console.log({
level: 'warn',
message: 'Circuit breaker opened',
serviceName: this.serviceName,
failures,
threshold: this.options.failureThreshold,
timestamp: new Date().toISOString(),
});
6. Testing: Test circuit breaker behavior under load. Use tools like k6 or Artillery to simulate failure scenarios and verify circuits open/close correctly across multiple servers.
Need Backend Reliability Help?
Building production-grade circuit breakers is one part of a larger reliability strategy. If you need help with distributed systems, error handling, or scaling Node.js backends, I can help.
I've built these patterns for NeedThisDone.com and client projects. The code handles real production traffic and has prevented outages when third-party services failed.
See examples of reliability work on the portfolio page, or check out the services page for what I offer.
Related reading:
- Building Circuit Breaker Pattern in Node.js - Part 1 of this series, covering in-memory circuit breakers
- Request Deduplication: Preventing Double Submissions - Another production reliability pattern using Redis
Need Help Getting Things Done?
Whether it's a project you've been putting off or ongoing support you need, we're here to help.