EAAPLEnterprise AI Architecture Pattern Library
EAAPLLibraryAI Integration
Proven
⇄ Compare

EAAPL-INT007 — AI Circuit Breaker

EAAPL-INT007 — AI Circuit Breaker

Status: Mature Tags: circuit-breaker high-availability llm medium-complexity Version: 2.0.0 Last Updated: 2026-06-12


1. Pattern Identity

Field Value
Pattern ID EAAPL-INT007
Name AI Circuit Breaker
Category Integration
Maturity Mature
Complexity Medium
Related Patterns EAAPL-MAG001 · EAAPL-MAG002 · EAAPL-INT008

2. Executive Summary

The AI Circuit Breaker pattern adapts the classic circuit breaker from distributed systems engineering to the unique failure characteristics of AI service dependencies. Standard circuit breakers monitor HTTP error rates and latency. AI services introduce additional failure modes that a naive circuit breaker misses: a service that returns HTTP 200 consistently but takes 45 seconds per call; a service that returns responses within latency SLA but at a cost per call that is 10× normal; a service that returns responses that consistently fail downstream quality validation (hallucination spikes). Each of these is a functional failure from the calling system's perspective — but none would be detected by an HTTP-error-only circuit breaker. This pattern defines AI-specific circuit states and open conditions, fallback strategies appropriate to AI context, and the half-open recovery protocol that validates across all circuit conditions before closing. Multi-provider resilience — routing to a secondary model or provider when the primary circuit opens — is a first-class feature, not an afterthought.


3. Problem Statement

3.1 Context

Enterprise AI systems that depend on external LLM APIs (OpenAI, Anthropic, Google, Azure OpenAI) or internal model serving infrastructure face a different failure surface than traditional API dependencies. LLM providers experience: sudden latency spikes that do not correlate with error rates; cost anomalies (pricing changes, quota misconfiguration) that produce correct responses at 10× expected cost; safety policy updates that cause previously successful prompts to return refusals; degraded model quality during A/B testing or model updates that is invisible in HTTP metrics. A system with no circuit breaker on its AI dependency will: spend runaway budget during a cost anomaly; time out user requests during a latency spike; produce degraded outputs during a quality regression, potentially for hours before detection.

3.2 Forces in Tension

  • Sensitivity vs. false positives. A circuit breaker that trips too easily will open on transient anomalies and force the system into fallback mode unnecessarily. One that is not sensitive enough will fail to protect against genuine degradation.
  • Fallback quality vs. fallback availability. The ideal fallback (a different LLM provider) may not be available or may have different capabilities. Simpler fallbacks (cached responses, rule-based) degrade user experience significantly.
  • Recovery aggressiveness vs. stability. Closing the circuit quickly allows the system to return to normal operation but risks re-opening immediately if the underlying issue persists.
  • Per-model vs. per-provider. A circuit breaker per model enables fine-grained failure isolation but multiplies circuit breaker management complexity. A per-provider breaker is simpler but less precise.

3.3 Failure Modes Without This Pattern

Without an AI circuit breaker, a latency spike causes cascading failures as upstream requests queue up waiting for the AI service. A cost anomaly drains the monthly budget in hours. A quality regression goes undetected for the duration of normal monitoring cycles (hours to days). A safety policy change causes sudden, unexpected refusals with no graceful degradation path.


4. Solution

4.1 Circuit States

ARCHITECTURE DIAGRAM
flowchart TD subgraph States["Circuit States"] A[CLOSED Normal Operation] B[OPEN Requests Blocked] C[HALF-OPEN Probe Mode] end subgraph Transitions["State Transitions"] D{Failure Threshold Exceeded?} E{Recovery Timeout Elapsed?} F{Probe Succeeds on ALL Conditions?} end A --> D D -->|yes| B D -->|no| A B --> E E -->|yes| C E -->|no| B C --> F F -->|yes| A F -->|no| B style A fill:#d1fae5,stroke:#10b981 style B fill:#fee2e2,stroke:#ef4444 style C fill:#f3e8ff,stroke:#a855f7 style D fill:#f3e8ff,stroke:#a855f7 style E fill:#f3e8ff,stroke:#a855f7 style F fill:#f3e8ff,stroke:#a855f7

4.2 Multi-Provider Fallback Flow

ARCHITECTURE DIAGRAM
flowchart TD subgraph Request["Incoming Request"] A[AI Service Request] end subgraph Primary["Primary Provider"] B{Primary Circuit OPEN?} C[Route to Primary LLM] D{Request Succeeds?} end subgraph Fallback["Fallback Chain"] E[Route to Secondary LLM] F{Secondary Circuit OPEN?} G[Cached Response Fallback] H[Rule-Based Fallback] I[Human Queue] end subgraph Outcome["Outcome"] J[Success Response] K[Degraded Response with Disclaimer] L[Graceful Error] end A --> B B -->|no| C --> D B -->|yes| E D -->|yes| J D -->|no — update circuit| B E --> F F -->|no| D F -->|yes| G G --> K H --> K I --> L style A fill:#dbeafe,stroke:#3b82f6 style B fill:#f3e8ff,stroke:#a855f7 style C fill:#f0fdf4,stroke:#22c55e style D fill:#f3e8ff,stroke:#a855f7 style E fill:#f0fdf4,stroke:#22c55e style F fill:#f3e8ff,stroke:#a855f7 style G fill:#fef9c3,stroke:#eab308 style H fill:#fef9c3,stroke:#eab308 style I fill:#fee2e2,stroke:#ef4444 style J fill:#d1fae5,stroke:#10b981 style K fill:#d1fae5,stroke:#10b981 style L fill:#fee2e2,stroke:#ef4444

5. Structure

5.1 Component Catalogue

Component Responsibility Technology Options
Circuit State Store Persists current state for each circuit Redis (preferred), in-memory with cluster sync
Metric Collector Tracks error rate, latency, cost rate, quality score Prometheus, Datadog, OpenTelemetry
Threshold Evaluator Checks if any circuit-open condition is met Policy function evaluated on metric read
Probe Request Executor Issues test request in HALF-OPEN state Same request path as production, with test payload
Fallback Registry Maps primary circuit to ordered fallback chain Configuration file or service registry
Alert Publisher Notifies on-call when circuit opens or closes PagerDuty, Slack, OpsGenie

5.2 Circuit Configuration Schema

{
  "circuitId": "openai-gpt4o-production",
  "provider": "openai",
  "model": "gpt-4o",
  "openConditions": {
    "errorRatePercent": { "threshold": 5, "windowSeconds": 60 },
    "p99LatencyMs": { "threshold": 30000, "windowSeconds": 300 },
    "costRateUSDPerHour": { "threshold": 50.0, "windowSeconds": 3600 },
    "qualityScoreBelow": { "threshold": 0.70, "windowSeconds": 600 },
    "safetyRefusalRatePercent": { "threshold": 10, "windowSeconds": 300 }
  },
  "halfOpenAfterSeconds": 60,
  "probeRequestsRequired": 3,
  "fallbackChain": [
    { "type": "model", "circuitId": "anthropic-claude-3-haiku-production" },
    { "type": "cache", "maxAgeSeconds": 3600 },
    { "type": "rule-based", "handlerId": "simple-rule-fallback" },
    { "type": "human-queue", "queueId": "ai-fallback-review" }
  ]
}

6. Behaviour

6.1 AI-Specific Open Conditions

A standard circuit breaker opens on error rate alone. The AI circuit breaker opens on any of five conditions, each with its own threshold and measurement window:

Error rate. HTTP 4xx/5xx, timeouts, connection errors. Same as a standard circuit breaker. Threshold: 5% over a 60-second window.

P99 latency spike. An LLM call that takes 30 seconds does not produce an error — it produces a slow response. From the user's perspective, this is as bad as an error. A latency-only spike is a common symptom of provider capacity issues before they manifest as errors. Threshold: p99 > 30s over a 5-minute window.

Cost rate spike. If the provider changes pricing, if a prompt change caused token explosion, or if a quota misconfiguration is causing unbounded spend, the circuit breaker must detect this and halt requests to prevent budget exhaustion. Track cost per hour using the token count from response metadata. Threshold: > $50/hour sustained for 1 hour.

Quality score degradation. If downstream quality validation (LLM-as-judge or task-specific scorer) begins failing at an elevated rate, the model may be undergoing a quality regression (A/B test, model update, prompt cache corruption). Track the rolling quality score. Threshold: quality score < 0.70 over a 10-minute window.

Safety refusal rate. If the provider's safety filters are flagging a high proportion of requests (e.g., due to a system prompt update), the circuit should open so that the issue can be investigated without continuing to send requests that will be refused. Threshold: > 10% refusal rate over 5 minutes.

6.2 CLOSED State

In the CLOSED state, all requests are routed to the primary provider normally. The metric collector continuously tracks all five dimensions. When any single threshold is breached, the circuit transitions to OPEN.

Threshold evaluation. Evaluate thresholds on a sliding window basis (not epoch-based). Sliding windows avoid the "burst-at-window-boundary" false negative where a burst of failures spanning two epoch windows individually appears below threshold.

6.3 OPEN State

In the OPEN state, no requests are routed to the primary provider. All incoming requests are immediately routed to the fallback chain without attempting the primary. This is critical: a circuit breaker that still attempts the primary in OPEN state provides no protection against latency spikes.

Fallback chain execution. The circuit breaker walks down the configured fallback chain in order:

  1. Secondary model. If the secondary model's circuit is CLOSED, route to it. The secondary model's circuit is independently evaluated.
  2. Cache. If a cached response exists for a semantically similar request and is within the configured max age, return it with a [CACHED RESPONSE] disclaimer.
  3. Rule-based fallback. If the request can be answered by a deterministic rule (e.g., a FAQ lookup), execute the rule and return the result with a degraded-mode disclaimer.
  4. Human queue. If all above options are exhausted, enqueue the request for human processing and return a "We are experiencing high demand, your request will be processed shortly" response.

6.4 HALF-OPEN State

After the configured recovery timeout, the circuit transitions from OPEN to HALF-OPEN. In HALF-OPEN:

  • A limited number of probe requests (configurable; default: 3) are routed to the primary provider.
  • Each probe request is evaluated against ALL five circuit conditions (error rate, latency, cost, quality, safety refusal), not just the condition that caused the circuit to open.
  • All three probes must succeed on all conditions before the circuit closes.
  • If any probe fails any condition, the circuit immediately transitions back to OPEN with a doubled recovery timeout (exponential backoff on recovery attempts).

Why evaluate all conditions in half-open? The condition that caused the circuit to open may have resolved, but a different condition may have degraded in the interim. A half-open check that only validates the original open condition misses this scenario.

6.5 Per-Model Circuit Breakers

Deploy one circuit breaker per model-environment combination (e.g., openai-gpt4o-production, anthropic-claude-3-5-sonnet-production, anthropic-claude-3-haiku-production). This enables:

  • Precise failure isolation: a gpt-4o outage does not open the claude-sonnet circuit.
  • Cost-tier routing: when a frontier model circuit opens, route to a cheaper model rather than to cache or human queue.
  • Independent recovery: each circuit recovers on its own timeline.

7. Implementation Guide

7.1 Step-by-Step

Step 1 — Instrument the AI client. Wrap your LLM client (Anthropic SDK, OpenAI SDK, etc.) so that every call records: timestamp, model, input/output token counts, latency, response status, and quality score (if available). These metrics feed the circuit breaker's threshold evaluator.

Step 2 — Choose a circuit state store. Use Redis with an expiring key per circuit. The key stores: current state (CLOSED/OPEN/HALF_OPEN), open timestamp, metrics summary (rolling window values for each condition), probe request count. Redis TTL ensures stale circuit state is cleared automatically.

Step 3 — Implement the threshold evaluator. A function that reads the rolling metric window from Redis and returns { open: true, reason: "LATENCY_SPIKE", details: "p99=42s > threshold=30s" } or { open: false }. Run this after every LLM call completes.

Step 4 — Implement the fallback chain. Build a registry of fallback handlers keyed by type (model, cache, rule-based, human-queue). The circuit breaker executor walks the chain until a handler succeeds.

Step 5 — Implement the half-open probe. In HALF-OPEN state, the first N incoming requests (not a special probe request — real incoming requests act as probes) are handled with full metric tracking. If N consecutive requests pass all thresholds, close the circuit.

Step 6 — Wire alerts. Publish circuit state change events (CLOSED→OPEN, OPEN→HALF-OPEN, HALF-OPEN→CLOSED, HALF-OPEN→OPEN) to your alerting system. On-call engineers should be notified immediately when a circuit opens.

7.2 Code Skeleton (TypeScript)

type CircuitState = "CLOSED" | "OPEN" | "HALF_OPEN";

interface CircuitMetrics {
  errorRatePercent: number;
  p99LatencyMs: number;
  costRateUSDPerHour: number;
  qualityScore: number;
  safetyRefusalRatePercent: number;
}

class AICircuitBreaker {
  private state: CircuitState = "CLOSED";
  private probeSuccessCount = 0;
  private recoveryTimeoutMs = 60_000;

  async call<T>(request: () => Promise<T>, config: CircuitConfig): Promise<T | FallbackResult> {
    if (this.state === "OPEN") {
      return this.executeFallback(config.fallbackChain);
    }

    const start = Date.now();
    try {
      const result = await Promise.race([
        request(),
        new Promise<never>((_, reject) =>
          setTimeout(() => reject(new Error("TIMEOUT")), config.openConditions.p99LatencyMs.threshold)
        )
      ]);

      const latencyMs = Date.now() - start;
      await this.recordSuccess(config, latencyMs, result);

      if (this.state === "HALF_OPEN") {
        this.probeSuccessCount++;
        if (this.probeSuccessCount >= config.probeRequestsRequired) {
          await this.close(config.circuitId);
        }
      }
      return result;
    } catch (error) {
      const latencyMs = Date.now() - start;
      await this.recordFailure(config, latencyMs);
      const metrics = await this.getMetrics(config.circuitId);

      if (this.shouldOpen(metrics, config)) {
        await this.open(config.circuitId);
        return this.executeFallback(config.fallbackChain);
      }
      throw error;
    }
  }

  private shouldOpen(metrics: CircuitMetrics, config: CircuitConfig): boolean {
    return (
      metrics.errorRatePercent > config.openConditions.errorRatePercent.threshold ||
      metrics.p99LatencyMs > config.openConditions.p99LatencyMs.threshold ||
      metrics.costRateUSDPerHour > config.openConditions.costRateUSDPerHour.threshold ||
      metrics.qualityScore < config.openConditions.qualityScoreBelow.threshold ||
      metrics.safetyRefusalRatePercent > config.openConditions.safetyRefusalRatePercent.threshold
    );
  }

  private async open(circuitId: string): Promise<void> {
    this.state = "OPEN";
    this.probeSuccessCount = 0;
    await redis.set(`circuit:${circuitId}:state`, "OPEN", { ex: 3600 });
    await alertPublisher.publish({ event: "CIRCUIT_OPENED", circuitId, timestamp: new Date() });
    setTimeout(() => this.transitionToHalfOpen(circuitId), this.recoveryTimeoutMs);
    this.recoveryTimeoutMs = Math.min(this.recoveryTimeoutMs * 2, 3600_000); // exponential backoff
  }

  private async close(circuitId: string): Promise<void> {
    this.state = "CLOSED";
    this.recoveryTimeoutMs = 60_000; // reset backoff
    await redis.set(`circuit:${circuitId}:state`, "CLOSED", { ex: 3600 });
    await alertPublisher.publish({ event: "CIRCUIT_CLOSED", circuitId, timestamp: new Date() });
  }
}

8. Observability

8.1 Circuit State Dashboard

A dedicated dashboard per circuit showing:

  • Current state (CLOSED / OPEN / HALF-OPEN) with time in current state
  • Rolling metrics for all five conditions vs. their thresholds (time series)
  • Circuit state transition history (timeline)
  • Fallback activation frequency and type distribution
  • Recovery attempt success rate (HALF-OPEN → CLOSED vs. HALF-OPEN → OPEN)

8.2 Key Metrics

Metric Description Alert Threshold
Circuit open duration Total time in OPEN state per hour > 10 min/hr on any circuit
Fallback activation rate % of requests served by fallback > 5% over 1 hour
Recovery success rate % of HALF-OPEN transitions that lead to CLOSE < 60% (underlying issue persists)
Cost rate USD per hour per model circuit Above circuit open condition threshold
Quality score trend Rolling quality score per circuit Downward trend over 24h

9. Cost Governance

  • Cost spike protection. The cost rate condition is the primary cost governance mechanism. Configure cost thresholds based on your monthly budget: costRateThreshold = (monthlyBudget × 0.20) / 24 (limit any single day's spend to 20% of monthly budget, expressed as hourly rate).
  • Fallback cost tracking. Secondary model fallbacks also incur cost. Track fallback cost separately. If fallback cost exceeds a threshold, alert — it means the primary circuit has been open for an extended period and the fallback is being used as a primary.
  • Circuit breaker overhead. The circuit breaker's metric collection and threshold evaluation adds approximately 1–2ms per request. This overhead is negligible relative to LLM call latency.

10. Security Considerations

10.1 Circuit State Store Security

The circuit state store (Redis) must be accessible only by authorised circuit breaker components. An attacker who can write to the circuit state store can open or close circuits at will, either preventing AI service access (DoS) or suppressing protection (allowing a degraded service to serve requests). Apply authentication and encryption in transit to the Redis connection.

10.2 Fallback Response Integrity

Cached fallback responses must be validated before serving. A stale cached response may contain personally identifiable information from a different user's request. Ensure the cache key includes the tenant/user context so that cached responses are never served to a different user than the one who generated them.

10.3 Alert Flooding

A rapidly oscillating circuit (opening and closing repeatedly due to a borderline issue) generates a flood of alerts. Implement alert deduplication: suppress additional CIRCUIT_OPENED alerts for the same circuit within a 15-minute window after the first alert. This prevents on-call fatigue while still ensuring the initial alert reaches the on-call engineer.


11. Failure Modes and Mitigations

Failure Mode Detection Mitigation
Circuit opens on transient blip Frequent short-duration open events Increase minimum window size; require threshold breach for 3 consecutive intervals
Recovery timeout too short — oscillating circuit HALF-OPEN → OPEN transition rate above 40% Exponential backoff on recovery timeout (see Section 7.2)
Fallback chain exhausted All fallback options return error Return graceful error to user; alert on-call that no fallback is available
Circuit state store unavailable Circuit breaker cannot read/write state Fail open (route to primary provider) when state store is unavailable; alert immediately
Quality score metric unavailable Quality condition cannot be evaluated Fall back to evaluating remaining 4 conditions only; alert on missing quality metric
Cost rate spike not detected Cost per hour grows gradually below hourly threshold Add a cumulative daily cost alert independent of the circuit breaker

12. Compliance and Governance

12.1 Business Continuity

The circuit breaker is a business continuity control. Document it in your business continuity plan (BCP) as the mechanism for handling AI service provider outages. Include the fallback chain, expected degraded-mode behaviour, and recovery procedures.

12.2 SLA Reporting

When the circuit was open, requests were served by fallback. Document the fallback response quality characteristics so that SLA reporting can reflect: "During period X, responses were served by secondary model / cache / rule-based fallback due to primary provider degradation." This transparency is required for regulated use cases where SLA terms are contractual.

12.3 Model Risk Management

For financial services applications, the circuit breaker demonstrates that the organisation has controls to detect and respond to model quality degradation (SR 11-7 requirement). The quality score condition provides evidence of ongoing model quality monitoring. Retain circuit state transition logs for the duration of the model's production life.


13. Testing Strategy

13.1 Unit Tests

  • Threshold evaluator: for each of the five conditions, inject a metric value above and below the threshold; assert the correct open/close decision.
  • State machine: assert all six state transitions (see Section 4.1) occur for the correct inputs.
  • Exponential backoff: assert that consecutive HALF-OPEN → OPEN transitions double the recovery timeout up to the maximum.

13.2 Integration Tests

  • Simulate a latency spike: inject a mock LLM client that returns responses after 35 seconds; assert the circuit opens after p99 latency threshold is breached; assert fallback is activated.
  • Simulate a cost spike: mock LLM client reporting 100K tokens per call; assert the circuit opens when hourly cost threshold is breached.
  • Simulate quality degradation: mock LLM client returning quality score 0.60; assert circuit opens after quality threshold is breached.
  • Recovery flow: open the circuit; wait for recovery timeout; assert circuit transitions to HALF-OPEN; send 3 successful probe requests; assert circuit closes.

13.3 Chaos Engineering Tests

  • Kill the Redis circuit state store; assert the circuit breaker fails open (routes to primary) and generates an alert.
  • Force the secondary model circuit open simultaneously with the primary; assert the circuit breaker correctly falls through to the cache fallback.

14. Variants and Extensions

14.1 Graduated Circuit Breaker

Rather than binary OPEN/CLOSED, implement a graduated throttle: at 50% of the error threshold, route 10% of traffic to fallback; at 75%, route 30%; at 100%, route all traffic to fallback (full OPEN). Reduces abrupt degradation at the circuit open boundary.

14.2 Model Quality Scoring Integration

Integrate your LLM-as-judge quality scorer directly into the circuit breaker metric pipeline. After each response is quality-scored, feed the score to the circuit breaker's quality condition. This makes the circuit breaker responsive to subtle quality regressions that do not manifest as errors or latency spikes.

14.3 Predictive Circuit Opening

Use time-series anomaly detection on the five metric streams. Rather than reacting to threshold breaches, predict when a threshold will be breached based on current trends and open the circuit proactively. Reduces the number of failed requests that occur between when degradation begins and when the threshold is breached.


15. Trade-off Analysis

Dimension AI Circuit Breaker Error-Only Circuit Breaker No Circuit Breaker
Latency spike protection Yes No No
Cost spike protection Yes No No
Quality degradation protection Yes No No
Implementation complexity Moderate Low None
False positive risk Moderate (5 conditions) Low (1 condition) None
Fallback chain required Yes Yes No

16. Known Implementations

Organisation Type Use Case Condition That Saved Them Reported Outcome
Global retailer Product recommendation API Latency spike (p99 went to 45s) Circuit opened in 4 min; fallback served 100% of requests; $0 SLA penalty
Financial services Loan underwriting AI Cost rate spike (prompt bug caused 10× token usage) Circuit opened; $12K budget overspend prevented
Healthcare Clinical note summarisation Quality score degradation during provider model update Circuit opened 40 min before provider announced maintenance
Legal tech Contract review API Safety refusal spike after provider policy update Circuit opened; team notified; prompt updated within 2 hours

Pattern ID Name Relationship
EAAPL-MAG001 Multi-Agent Orchestration Circuit breaker applied per specialist agent in the dispatch layer
EAAPL-MAG002 Supervisor Agent Supervisor uses circuit breaker per worker type
EAAPL-INT008 Bidirectional AI Sync Circuit breaker protects AI processing step in the sync pipeline

18. References

  1. Nygard, M.T., "Release It! Design and Deploy Production-Ready Software," 2nd ed., Pragmatic Bookshelf, 2018
  2. Fowler, M., "CircuitBreaker Pattern" — martinfowler.com/bliki/CircuitBreaker.html
  3. Gartner, "Resilience Patterns for Enterprise AI Services," 2025 (ID: G00820456)
  4. AWS, "Implementing the Circuit Breaker Pattern with AWS Lambda and DynamoDB," 2024
  5. Netflix Technology Blog, "Making the Netflix API More Resilient," 2012 — netflixtechblog.com
  6. OpenAI API Status Page — status.openai.com (reference for real-world AI provider failure modes)
  7. Anthropic API Documentation: Rate Limits and Error Handling — docs.anthropic.com/api/errors
  8. NIST SP 800-204: Security Strategies for Microservices-based Application Systems
  9. SR 11-7: Guidance on Model Risk Management — Section III: Model Validation
  10. Microsoft Azure Well-Architected Framework: Resiliency — learn.microsoft.com/azure/well-architected/reliability
← Back to LibraryMore AI Integration