Proven

EAAPL-MDL002 — Shadow Model Deployment

Attribute	Value
Pattern ID	EAAPL-MDL002
Name	Shadow Model Deployment
Maturity	Proven
Complexity	High
Tags	`model-risk` `observability` `high-availability` `high-complexity`
Last Reviewed	2026-06-12
Owner	Enterprise AI Architecture Practice

1. Executive Summary

Shadow model deployment allows an organisation to validate a new AI model under real production conditions — full traffic load, real user inputs, live context — without exposing users to the new model's outputs. Production traffic is mirrored asynchronously to the shadow model; the shadow computes a response, which is stored and compared to the production response but never served to the user. This eliminates the principal risk of model upgrades: discovering that a new model behaves differently only after users experience it. For CIOs, shadow deployment is a mandatory risk control before promoting any model upgrade in a regulated or customer-facing context. For CTOs, it provides statistically grounded promotion criteria grounded in real traffic rather than offline benchmarks. For risk officers, it is the evidentiary record that demonstrates the organisation validated model behaviour before promoting a change. The pattern is high-complexity because it requires an asynchronous traffic mirroring infrastructure, a shadow response storage layer, and a statistical comparison pipeline — none of which exist in most organisations by default. The investment is justified for any model with material business impact: customer-facing recommendation, credit decisioning, fraud detection, medical triage support, or automated content moderation.

2. Problem Statement

2.1 Business Problem

Organisations upgrade AI models periodically to improve quality, reduce cost, or address emerging risks. Conventional practice is to test on an offline dataset, then deploy to production. The offline dataset is always stale: it does not represent the current distribution of real user inputs, seasonal patterns, or adversarial inputs. Models that pass offline benchmarks fail in production. The business discovers the failure through customer complaints, revenue impact, or regulatory action.

2.2 Technical Problem

Offline evaluation cannot capture the full complexity of production traffic. Real production requests carry user-specific context, system state, upstream service responses, and time-sensitive signals that are absent from a fixed benchmark dataset. A model that performs identically to its predecessor on a benchmark dataset may perform materially differently on the long tail of real production inputs.

2.3 Symptoms

Model upgrades cause unexpected quality regressions discovered by customer feedback, not internal monitoring.
Post-upgrade error rates spike before detection — mean time to detection is hours, not minutes.
There is no statistical basis for the decision to promote a new model ("it looked good in testing").
Rollbacks are required for >30% of model promotions in the past year.
The organisation cannot demonstrate to a regulator that it validated model behaviour before deployment.

2.4 Cost of Inaction

Category	Indicative Impact
Quality Risk	Model regression discovered in production affects all users; rollback takes 5–30 minutes during which users are impacted
Regulatory	EU AI Act Article 9 risk management obligation not met without pre-production validation evidence
Reputational	Public incident caused by model regression damages brand; recovery requires customer communications
Financial	Customer churn from degraded experience; revenue loss during incident; incident investigation cost

3. Context

3.1 When to Apply

Before promoting any model version in a customer-facing, regulated, or high-stakes context.
When offline benchmark datasets may not represent current production traffic distribution.
When the new model represents a MINOR or MAJOR version change (per EAAPL-MDL001 schema).
When rollback risk is high and the cost of a production incident exceeds shadow infrastructure cost.

3.2 When NOT to Apply

PATCH version changes (quantisation, minor optimisation) where behaviour change is expected to be negligible — use regression testing instead.
Models serving internal tooling with no customer or regulatory impact.
Contexts where traffic volume is so low that statistical comparison is meaningless (< 1,000 requests/day — use canary instead).
Stateful write-heavy models where shadow execution risks side effects (see Section 4.4).

3.3 Prerequisites

Prerequisite	Detail
Traffic mirroring capability	Load balancer or service mesh capable of async request duplication
Shadow response store	High-throughput, schema-flexible storage for shadow + production response pairs
Comparison analysis pipeline	Automated pipeline running daily statistical comparison of shadow vs production
Model versioning (EAAPL-MDL001)	Both production and shadow models must be versioned and registered
Promotion criteria definition	Measurable, pre-agreed criteria for shadow-to-production promotion

3.4 Industry Applicability

Industry	Applicability	Primary Driver
Financial Services	Critical	APRA CPS230 change management; credit/fraud model validation
Healthcare	Critical	Patient safety; clinical decision support validation
E-commerce / Retail	High	Revenue-impacting recommendation engine upgrades
Media / Content	High	Content moderation model upgrades affecting policy enforcement
Government	High	Service delivery quality; citizen-facing AI accountability
Technology Platforms	Medium	API quality guarantees to downstream consumers

4. Architecture Overview

4.1 Traffic Mirroring Architecture

Production traffic is mirrored asynchronously to the shadow model using a request duplication layer placed at the load balancer or service mesh level. The critical design principle is that the mirroring is asynchronous and non-blocking: the production request path is unaffected by any shadow-side processing. If the shadow model is slow or fails, the production response is never delayed or affected.

The request duplicator captures the full request payload — including headers, authentication context (anonymised), timestamp, and all inference inputs — and enqueues a copy to a shadow inference queue. The shadow model consumer reads from this queue and processes requests at its own pace. Because shadow processing is decoupled from production request latency, the shadow model can be run on lower-priority compute — scheduled spot instances, off-peak batch processing — without affecting production SLOs.

4.2 Shadow Response Storage

Every shadow inference produces a response pair: the shadow model's response and the corresponding production model's response (retrieved from the production response log by matching a request correlation ID). These pairs are stored in a shadow comparison store — a document or columnar database optimised for the comparison analysis pipeline. The store retains: request ID, timestamp, request payload hash (not cleartext for privacy), production response, shadow response, and all computed quality metrics for both responses.

Retention policy for shadow response pairs: 90 days online, then purged (unless subject to regulatory retention). The shadow store must be scoped as a non-production system: real user input data subject to privacy regulations must be anonymised or pseudonymised before storage.

4.3 Comparison Analysis Pipeline

A daily analysis pipeline processes accumulated shadow/production pairs and produces a comparison report. The pipeline computes: (1) quality metrics for both models — accuracy, BLEU/ROUGE/BERTScore for generation tasks, calibration for classification; (2) latency distribution (p50, p95, p99) for shadow vs production; (3) error rate comparison; (4) cost per inference comparison; (5) safety check results (does shadow model generate any content that production model would not?); (6) disagreement rate — the proportion of requests where the two models produce materially different outputs. The report is published to the model governance dashboard and stored in the Model Register against the shadow version.

4.4 Handling Stateful Operations in Shadow

Shadow models must operate in read-only mode. They must not write to any production database, send notifications, invoke external APIs, or modify any shared state. Shadow inference is computation-only. For models that normally invoke tools or external systems, the shadow request processor must use a stubbed tool layer that records the intended tool calls without executing them. This is enforced by infrastructure — the shadow model's service account has no write permissions on production systems.

4.5 Shadow Duration Guidelines

Shadow duration is determined by model risk tier: Low-risk internal models require a minimum of 1 week with at least 10,000 shadow requests. Medium-risk customer-facing models require a minimum of 2 weeks with at least 50,000 shadow requests. High-risk regulated models (credit, medical, fraud) require a minimum of 4 weeks with at least 100,000 shadow requests and explicit sign-off from the risk function. These are minimums — shadow should continue until promotion criteria are met, regardless of calendar time.

4.6 Promotion Criteria

Promotion from shadow to production (via canary release per EAAPL-MDL003) requires all of the following: (1) shadow quality score meets or exceeds production by the margin defined at version registration; (2) shadow p99 latency within 20% of production p99; (3) shadow error rate does not exceed production error rate; (4) shadow safety check passes (zero content safety violations); (5) minimum shadow duration met; (6) comparison report reviewed and approved by model owner and, for high-risk models, AI Governance.

5. Architecture Diagram

ARCHITECTURE DIAGRAM

flowchart TD subgraph Traffic["Traffic Layer"] A[User Request] B[Load Balancer] end subgraph Models["Model Serving"] C[Production Model] D[Shadow Inference Queue] E[Shadow Model] end subgraph Analysis["Comparison and Governance"] F[(Shadow Response Store)] G[Comparison Pipeline] H{Promotion Decision} end A --> B B -->|sync primary| C B -->|async mirror| D D --> E C --> F E --> F F --> G G --> H H -->|criteria met| I[Canary Release] H -->|criteria not met| J[Extend Shadow Period] style A fill:#dbeafe,stroke:#3b82f6 style B fill:#f0fdf4,stroke:#22c55e style C fill:#d1fae5,stroke:#10b981 style D fill:#fef9c3,stroke:#eab308 style E fill:#dbeafe,stroke:#3b82f6 style F fill:#fef9c3,stroke:#eab308 style G fill:#f0fdf4,stroke:#22c55e style H fill:#f3e8ff,stroke:#a855f7 style I fill:#d1fae5,stroke:#10b981 style J fill:#fee2e2,stroke:#ef4444

6. Components

Component	Type	Responsibility	Technology Options	Criticality
Request Duplicator	Infrastructure	Asynchronously mirrors production requests to shadow queue; zero production latency impact	Envoy mirror filter, AWS ALB mirroring, Nginx mirror, Istio	Critical
Shadow Inference Queue	Messaging	Decouples shadow processing from production path; buffers during shadow compute spikes	AWS SQS, Azure Service Bus, GCP Pub/Sub, Kafka	High
Shadow Model Serving	Inference	Runs shadow model version against mirrored requests	Same inference infrastructure as production; lower-priority compute	High
Stub Tool Layer	Safety Guard	Intercepts tool calls from shadow model; records intent without executing	Custom middleware; feature flag that disables external calls	Critical
Shadow Response Store	Data Store	Stores request/response pairs for comparison analysis	DynamoDB, BigQuery, Snowflake, PostgreSQL	High
Comparison Analysis Pipeline	Batch Compute	Runs daily statistical comparison; produces comparison report	Apache Spark, AWS Glue, dbt + SQL, custom Python pipeline	High
Model Governance Dashboard	Observability	Presents comparison results; supports promotion decision workflow	Grafana, custom React dashboard, Looker	Medium

7. Data Flow

7.1 Primary Flow

Step	Actor	Action	Output
1	User	Sends inference request	Request received at load balancer
2	Load Balancer	Routes request to production model; asynchronously mirrors to shadow queue	Production request dispatched; shadow message enqueued
3	Production Model	Processes request; returns response	Production response served to user; logged with request ID
4	Shadow Queue Consumer	Reads mirrored request; invokes shadow model	Shadow inference job initiated
5	Shadow Model	Processes mirrored request via stub tool layer	Shadow response computed; tool calls recorded not executed
6	Shadow Response Writer	Writes shadow response + matching production response to shadow store	Response pair persisted with correlation ID
7	Comparison Pipeline	Daily run: reads all new pairs; computes metrics; generates report	Comparison report published to governance dashboard
8	Model Owner / Governance	Reviews report against promotion criteria	Promotion approved or shadow period extended

7.2 Error Flow

Error Scenario	Detection	Recovery Action
Shadow model inference failure	Error rate monitor on shadow consumer	Log error; skip pair; alert on sustained failure rate > 5%
Shadow queue backpressure	Queue depth monitor	Scale shadow consumer; shed shadow load (production unaffected)
Stub tool layer bypass (shadow writes)	Audit log alert on unexpected write attempt	Halt shadow processing; security investigation; version quarantined
Comparison pipeline failure	Pipeline health monitor	Retry pipeline; alert after 2 consecutive daily failures
Response pair storage at capacity	Storage utilisation alert	Age out pairs beyond retention window; scale storage

8. Security Considerations

8.1 Controls Summary

Domain	Control
Authentication	Shadow model service account isolated from production service account; no shared credentials
Authorisation	Shadow model service account has read-only access to inference inputs; no write access to any production system
Secrets	Shadow model uses same secrets manager as production; keys scoped per model version
Classification	Shadow response store classified at same level as production data; user request payloads anonymised before storage
Encryption	Shadow store encrypted at rest (AES-256) and in transit (TLS 1.3)
Auditability	All shadow inference attempts logged; any tool call attempt (stub or real) logged to audit trail

8.2 OWASP LLM Top 10 Relevance

OWASP LLM Risk	Relevance	Mitigation
LLM01 Prompt Injection	High	Shadow model processes real production inputs including potentially adversarial content; must run in isolated sandbox
LLM02 Insecure Output Handling	Medium	Shadow responses are stored not served, but must still be sanitised before display in comparison dashboard
LLM03 Training Data Poisoning	Low	Shadow model is a pre-trained/fine-tuned candidate; poisoning risk addressed in training pipeline (EAAPL-MDL006)
LLM04 Model Denial of Service	Medium	Shadow queue acts as a buffer; but sustained high volume can exhaust shadow compute budget
LLM05 Supply Chain Vulnerabilities	Medium	Shadow model shares supply chain with production; validated by same provenance check
LLM06 Sensitive Information Disclosure	High	Request payloads contain real user data; pseudonymisation before shadow store is mandatory
LLM07 Insecure Plugin Design	High	Stub tool layer is the primary control; any bypass allows shadow model to take real-world action
LLM08 Excessive Agency	High	Stub tool layer prevents shadow from executing any action; this is the central security control
LLM09 Overreliance	Low	Shadow is internal validation tooling; overreliance not applicable
LLM10 Model Theft	Medium	Shadow response store contains model outputs at scale; store access controls prevent inference reversal

9. Governance Considerations

9.1 Responsible AI

Shadow testing must include fairness analysis in the comparison report: do the shadow model's outputs diverge from production in ways that are disproportionate across demographic subgroups? Any fairness regression detected in shadow is a blocking criterion for promotion, regardless of overall quality metrics.

9.2 Model Risk Management

Shadow deployment is the pre-production validation stage of the MRM lifecycle. The comparison report constitutes evidence for model validation. For APRA-regulated entities, the comparison report is part of the model governance record and must be retained.

9.3 Human Approval Gates

Promotion from shadow to canary is a human decision. The comparison report informs the decision but does not automate it. The model owner must explicitly approve promotion. For high-risk models, the AI Governance function countersigns. Automated promotion without human review is not permitted.

9.4 Governance Artefacts

Artefact	Owner	Frequency	Location
Shadow Comparison Report	Model Owner	Daily during shadow	Model Register + governance dashboard
Shadow Period Summary	AI Governance	At promotion decision	Model governance record
Stub Tool Layer Audit Log	Security Operations	Continuous	SIEM
Privacy Impact Assessment	Privacy Officer	Per shadow deployment	Privacy register

10. Operational Considerations

10.1 SLOs

SLO	Target	Measurement Method
Shadow queue lag behind production	< 60 seconds	Queue consumer lag metric
Shadow processing error rate	< 1%	Error counter on shadow consumer
Comparison report publication latency	< 2 hours after midnight	Pipeline completion timestamp
Response pair storage availability	99.9%	Storage health check

10.2 Monitoring and Logging

Key metrics to monitor continuously during shadow period: shadow queue depth (alert if > 10,000 unprocessed), shadow consumer error rate (alert if > 1%), shadow model latency p99 (informational — not blocking production), daily comparison report publication (alert if missing), stub tool layer bypass attempts (alert immediately — P1).

10.3 Incident Response

Two incident classes specific to shadow deployment: (1) Shadow production interference — if any shadow operation writes to or calls a production system, halt shadow immediately; security investigation; version quarantined until investigation complete. (2) Shadow queue saturation impacting production — theoretically impossible if mirroring is purely async; if observed, circuit-breaker drops shadow traffic; P1 incident.

10.4 Disaster Recovery

Scenario	RPO	RTO	Recovery Procedure
Shadow store data loss	24h	4 hours	Restart shadow period; production unaffected
Shadow consumer failure	N/A	1 hour	Restart consumer; process queued messages; production unaffected
Comparison pipeline failure	N/A	2 hours	Retry pipeline run; extend shadow period if report missing

10.5 Capacity Planning

Shadow infrastructure processes the same volume as production but asynchronously. Size shadow compute at 30–50% of production inference capacity (queue provides elasticity). Shadow response store grows at: (average response size) × (daily request volume) × (retention days). For a service with 100,000 requests/day at 2KB average response size and 90-day retention: ~18 GB. Plan at 5× for safety margins and comparison metadata.

11. Cost Considerations

11.1 Cost Drivers

Driver	Description	Relative Impact
Shadow inference compute	Running shadow model at production traffic volume	High
Shadow response storage	Storing 90 days of response pairs at production volume	Medium
Comparison pipeline compute	Daily batch analysis of accumulated pairs	Low
Queue infrastructure	Managed queue service at production message volume	Low
Engineering time	Setting up and maintaining shadow infrastructure per model	High

11.2 Scaling Risks

Shadow inference compute scales linearly with production traffic. A traffic spike doubles shadow compute cost. Mitigation: shadow consumer operates with a configurable maximum throughput; excess shadow requests are shed (shadow completeness reduces, but production is unaffected). Monitor shadow completeness: if < 80%, extend shadow duration.

11.3 Optimisations

Use spot/preemptible instances for shadow inference (shadow is delay-tolerant).
Process shadow requests in micro-batches for GPU efficiency (batch size 8–32 depending on model).
Use columnar compression on shadow response store (response text compresses 5–10×).
Skip shadow for PATCH version changes; run only comparison analysis on a sampled offline subset.

11.4 Indicative Cost Range

Traffic Volume	Monthly Shadow Cost (Inference Only)	Assumptions
Low (< 100K req/day)	$500–$2,000	Spot GPU instances; 4-week shadow; small LLM
Medium (100K–1M req/day)	$2,000–$15,000	Managed GPU cluster; spot pricing; auto-scaling
High (> 1M req/day)	$15,000–$80,000	Dedicated GPU fleet; storage at scale

12. Trade-Off Analysis

12.1 Shadow vs Alternative Validation Approaches

Approach	Quality Signal	Production Impact	Cost	Regulatory Evidence	Best For
Shadow deployment (this pattern)	High — real traffic	None	High	Strong	High-risk, regulated, customer-facing models
Canary release (EAAPL-MDL003)	High — real traffic + outcomes	User-visible risk	Medium	Strong	Medium-risk models with low rollback cost
Offline A/B on held-out set	Medium — static dataset	None	Low	Moderate	Research validation; pre-shadow gate
Manual QA on sampled requests	Low — human review	None	Medium	Weak	Small models, low volume, low risk

12.2 Architectural Tensions

Tension	Description	Resolution
Privacy vs Signal Quality	Using real user data maximises signal; but storage of real user inputs raises privacy risk	Pseudonymise at capture; store only input hash + model outputs; purge promptly after comparison
Shadow Completeness vs Cost	Full shadow coverage is ideal; cost may require sampling	Stratified sampling: ensure all input types represented; priority to long-tail and edge cases
Read-Only Constraint vs Realism	Shadow model cannot replicate stateful model behaviours (e.g., personalisation that writes state)	Shadow tests stateless inference quality only; stateful behaviour validated separately via integration tests

13. Failure Modes

Failure	Likelihood	Impact	Detection	Recovery
Shadow model writes to production system	Very Low	Critical	Audit log alert on unexpected write	Halt shadow; quarantine version; security investigation
Comparison report produces false positive	Medium	High	Manual review catches inconsistency	Re-run pipeline with corrected metrics; extend shadow period
Shadow queue memory leak causes host OOM	Low	Medium	Container memory alert	Restart consumer; process queued messages from checkpoint
Privacy breach: real PII in shadow store	Low	Critical	Data classification scan alert	Halt shadow; purge affected data; notify privacy officer
Stub bypass allows shadow notification to user	Very Low	High	User complaint; audit log	Halt shadow; user apology; investigate stub implementation

13.1 Cascading Failure Scenarios

If the shadow queue grows unbounded (consumer failure during high-traffic period), the queue infrastructure may exhaust storage. If queue infrastructure is shared with production messaging systems, this can cascade into production messaging failures. Mitigation: shadow queue is isolated from all production messaging infrastructure; maximum queue depth is bounded; when maximum is reached, new shadow messages are dropped (production unaffected).

14. Regulatory Considerations

Regulation / Framework	Relevant Clause	How This Pattern Addresses It
EU AI Act (2024/1689)	Article 9 (Risk Management System) — pre-deployment testing requirement for high-risk AI	Shadow deployment constitutes mandatory pre-deployment validation; comparison report is evidence
EU AI Act (2024/1689)	Article 10 (Data Governance) — training and validation data quality	Shadow uses real production distribution to validate beyond training data
ISO 42001:2023	Clause 8.4 (AI system lifecycle — verification and validation)	Shadow comparison report constitutes validation evidence per Clause 8.4
NIST AI RMF (2023)	MANAGE 2.2 (Mechanisms for test, evaluation, validation, verification)	Shadow is the primary TEVV mechanism for model upgrades
APRA CPS 230 (2025)	Paragraph 52 (Change management — testing)	Shadow constitutes pre-change testing; comparison report is test evidence
Privacy Act 1988 (Cth)	APP 3 (Collection of solicited personal information) / APP 11 (Security)	Shadow captures real user data — must have privacy notice scope and be secured at classification level of production data

15. Reference Implementations

15.1 AWS

Traffic Mirroring: AWS Application Load Balancer traffic mirroring; or Envoy proxy deployed on ECS/EKS with mirror filter.
Shadow Queue: Amazon SQS FIFO queue with shadow inference Lambda consumer.
Shadow Inference: SageMaker Endpoint (separate endpoint per shadow version); spot instance backed.
Shadow Store: Amazon DynamoDB (response pairs); S3 for bulk comparison data.
Comparison Pipeline: AWS Glue job (daily); results to S3 + QuickSight dashboard.

15.2 Azure

Traffic Mirroring: Azure API Management with request duplication policy; or Azure Service Mesh (Istio on AKS).
Shadow Queue: Azure Service Bus Premium (isolation from production).
Shadow Inference: Azure Machine Learning managed endpoint (shadow version); spot-backed compute cluster.
Shadow Store: Azure Cosmos DB (response pairs); Azure Blob for comparison data.
Comparison Pipeline: Azure Synapse Analytics (daily pipeline); Power BI dashboard.

15.3 GCP

Traffic Mirroring: Cloud Load Balancing with request mirroring; or Istio on GKE.
Shadow Queue: Cloud Pub/Sub (dedicated topic for shadow).
Shadow Inference: Vertex AI Endpoint (shadow version); preemptible GPU nodes.
Shadow Store: BigQuery (response pairs, columnar for analysis efficiency).
Comparison Pipeline: BigQuery ML + Dataflow; Looker dashboard.

15.4 On-Premises / Hybrid

Traffic Mirroring: Nginx mirroring directive; Envoy proxy sidecar in Kubernetes.
Shadow Queue: Apache Kafka (dedicated topic, separate consumer group).
Shadow Inference: Kubernetes Job on dedicated GPU node pool (lower-priority node affinity).
Shadow Store: PostgreSQL + TimescaleDB; columnar compression for response pairs.
Comparison Pipeline: Apache Spark on-cluster; Grafana dashboard.

Pattern ID	Pattern Name	Relationship Type	Description
EAAPL-MDL001	Model Versioning	Prerequisite	Shadow deployment operates on specific versioned model artefacts
EAAPL-MDL003	Canary Model Release	Next Step	Successful shadow completion gates entry to canary release
EAAPL-MDL004	Model Rollback	Sibling	If shadow reveals production regression, rollback pattern is applied to current prod version
EAAPL-MDL008	Model Access Governance	Dependency	Shadow model access is governed by same access tiers as production

17. Maturity Assessment

Overall Maturity: Proven

Dimension	Score (1–5)	Rationale
Industry Adoption	4	Shadow/dark launch is established in software; LLM-specific shadow is newer
Tooling Availability	3	Traffic mirroring is mature; LLM shadow comparison pipelines require custom build
Standards Alignment	4	Directly supports EU AI Act Article 9 and ISO 42001 Clause 8.4
Implementation Complexity	4 (high)	Requires async infrastructure, privacy controls, and statistical analysis pipeline
Regulatory Acceptance	4	Shadow evidence is accepted as pre-deployment validation by EU AI Act supervisors

18. Revision History

Version	Date	Author	Summary of Changes
1.0	2026-06-12	Enterprise AI Architecture Practice	Initial publication

Track this pattern for APRA/ASIC review

← Back to Library More Model Management →

EAAPL-MDL002 — Shadow Model Deployment

EAAPL-MDL002 — Shadow Model Deployment

1. Executive Summary

2. Problem Statement

2.1 Business Problem

2.2 Technical Problem

2.3 Symptoms

2.4 Cost of Inaction

3. Context

3.1 When to Apply

3.2 When NOT to Apply

3.3 Prerequisites

3.4 Industry Applicability

4. Architecture Overview

4.1 Traffic Mirroring Architecture

4.2 Shadow Response Storage

4.3 Comparison Analysis Pipeline

4.4 Handling Stateful Operations in Shadow

4.5 Shadow Duration Guidelines

4.6 Promotion Criteria

5. Architecture Diagram

6. Components

7. Data Flow

7.1 Primary Flow

7.2 Error Flow

8. Security Considerations

8.1 Controls Summary

8.2 OWASP LLM Top 10 Relevance

9. Governance Considerations

9.1 Responsible AI

9.2 Model Risk Management

9.3 Human Approval Gates

9.4 Governance Artefacts

10. Operational Considerations

10.1 SLOs

10.2 Monitoring and Logging

10.3 Incident Response

10.4 Disaster Recovery

10.5 Capacity Planning

11. Cost Considerations

11.1 Cost Drivers

11.2 Scaling Risks

11.3 Optimisations

11.4 Indicative Cost Range

12. Trade-Off Analysis

12.1 Shadow vs Alternative Validation Approaches

12.2 Architectural Tensions

13. Failure Modes

13.1 Cascading Failure Scenarios

14. Regulatory Considerations

15. Reference Implementations

15.1 AWS

15.2 Azure

15.3 GCP

15.4 On-Premises / Hybrid

16. Related Patterns

17. Maturity Assessment

18. Revision History