EAAPLEnterprise AI Architecture Pattern Library
EAAPLLibraryPlatform EngineeringEAAPL-PLT001
EAAPL-PLT001Proven
⇄ Compare

Enterprise AI Platform

⚙️ Platform EngineeringAPRA CPS230EU AI Act

[EAAPL-PLT001] Enterprise AI Platform

Category: Platform Engineering Sub-category: Foundation Platform Version: 1.4 Maturity: Mature Tags: platform-engineering, internal-developer-platform, golden-path, shared-services, model-serving, developer-experience Regulatory Relevance: APRA CPS230, CPS234, EU AI Act (Article 9 Risk Management), ISO 42001, NIST AI RMF (GOVERN 1.1)


1. Executive Summary

The Enterprise AI Platform pattern establishes a shared, governed infrastructure layer that enables product teams to consume AI capabilities safely and efficiently without each team solving foundational concerns independently. Rather than allowing every business unit to procure models, build integrations, and manage compliance in isolation—creating exponential risk surface and duplicated cost—this pattern centralises platform concerns while preserving product team autonomy.

The platform delivers measurable outcomes: 60–80% reduction in time-to-first-AI-feature for new teams, consolidated cost visibility with per-team chargeback, a single control plane for policy enforcement (data classification, model access tiers, rate limits), and an audit trail satisfying regulatory obligations across all AI usage. The platform team operates as an internal product team serving engineering consumers, not a gatekeeping function. Adoption is driven through golden paths—opinionated, well-documented routes to common AI use cases—that make the right thing the easy thing. This pattern is the prerequisite upon which all other EAAPL platform patterns depend.


2. Problem Statement

Business Problem

Enterprises face uncoordinated AI adoption: each team independently evaluates models, negotiates vendor contracts, builds bespoke integrations, and manages compliance obligations. This creates duplicated investment, inconsistent risk posture, and no executive visibility into total AI spend or exposure. AI incidents (data leakage, hallucination in customer-facing output, cost overruns) are discovered reactively with no systematic controls.

Technical Problem

Without a shared platform, teams build thin wrappers around foundation model APIs, each implementing authentication, logging, error handling, and cost tracking differently. There is no consistent mechanism for prompt versioning, model failover, semantic caching, or response auditing. Security review is performed ad hoc. Infrastructure drift compounds over time.

Symptoms

  • Multiple AWS/Azure/GCP AI accounts with no consolidated billing or spend alerts
  • Product engineers spending >30% of AI feature development time on infrastructure concerns
  • Security team performing point-in-time reviews rather than continuous enforcement
  • No audit trail mapping AI outputs to the model version and prompt that produced them
  • Data residency violations discovered post-deployment as teams use public endpoints without restriction
  • Duplicate vendor contracts for the same model provider across business units

Cost of Inaction

  • Regulatory non-compliance penalties (APRA operational risk, EU AI Act fines up to 3% global turnover)
  • AI security incidents with no forensic trail, increasing breach disclosure obligations
  • Cost inefficiency of 30–50% above market rate due to absence of volume commitments and caching
  • 6–12 month delays in AI capability delivery as teams rebuild foundational patterns from scratch

3. Context

When to Apply

  • Organisation has ≥3 product teams independently consuming or planning to consume AI services
  • Enterprise has data classification requirements that must be enforced before prompts leave the perimeter
  • AI spend is untracked or exceeds $50K/year across business units without consolidated visibility
  • Regulatory obligations (APRA, EU AI Act, privacy legislation) require audit trails for AI-assisted decisions
  • Platform or infrastructure team exists with mandate to provide shared engineering services

When NOT to Apply

  • Single-product startup with one team: overhead of platform exceeds benefit; use a managed API gateway directly
  • Proof-of-concept or time-boxed experiment: build direct integrations, migrate to platform post-validation
  • Fully air-gapped deployment with no shared infrastructure capability: consider a simplified on-premises variant

Prerequisites

  • Identity provider (IdP) capable of issuing service account credentials (OIDC/OAuth2)
  • Centralised secrets management (HashiCorp Vault, AWS Secrets Manager, Azure Key Vault)
  • Observability stack (metrics, logs, traces) available for platform instrumentation
  • Executive sponsorship and cross-BU agreement on platform adoption mandate (voluntary adoption rarely scales past early adopters)
  • Cloud landing zone or on-premises infrastructure with network segmentation capability

Industry Applicability

Industry Applicability Primary Driver
Financial Services (Banking, Insurance) Very High APRA CPS230/234, data residency, audit trails for AI-assisted decisions
Healthcare Very High Patient data privacy, clinical AI regulatory approval, audit requirements
Government High Data sovereignty, security classification, procurement rules
Retail / E-commerce High Cost at scale, multi-team coordination, personalisation pipelines
Media & Entertainment Medium Cost efficiency, content moderation, creator tools
Technology / SaaS Medium-High Developer productivity, model diversification, cost optimisation

4. Architecture Overview

The Enterprise AI Platform is structured as five horizontal layers stacked atop shared cross-cutting services. Each layer has a clear ownership boundary and a defined interface contract. The deliberate separation of concerns between layers is what allows the platform to evolve (e.g., swapping model providers, adding new compute tiers) without disrupting product teams.

Layer 1 — Infrastructure and Compute provides the physical and virtual compute substrate: GPU/accelerator clusters for self-hosted model serving, cloud provider AI endpoints (Amazon Bedrock, Azure OpenAI, Google Vertex AI), and VPC/network controls enforcing data residency. This layer is owned by the Platform Infrastructure team and changes infrequently. The critical design decision here is whether to use a shared GPU pool, dedicated per-tenant compute, or a hybrid—this choice has profound cost and isolation implications addressed in the Trade-Off Analysis.

Layer 2 — Model Serving and Registry abstracts individual model deployment concerns. It hosts the Model Registry (model metadata, capability cards, approved versions, deprecation notices), the Serving Layer (OpenAI-compatible inference endpoints whether models are self-hosted via vLLM/TorchServe or proxied from cloud providers), and the Model Lifecycle Manager. The OpenAI-compatible API surface is a deliberate choice: it maximises ecosystem compatibility and allows product teams to switch underlying models with zero code change.

Layer 3 — AI API Gateway is the primary integration point for all platform consumers. It enforces authentication (API keys/OIDC JWT), authorisation (RBAC/ABAC on model and capability access), rate limiting per consumer/team, cost allocation tagging, prompt/response logging for audit, semantic caching, and circuit breaking. Every request transits this layer—there are no side-door paths to models. This is the enforcement perimeter for all security and governance controls.

Layer 4 — Developer Services includes the capabilities that accelerate product team velocity: the Prompt Registry (versioned prompts with promotion workflows), the Evaluation Framework (automated benchmarking against golden datasets), the Experimentation Service (A/B routing for model comparison), and the RAG Orchestration Service. These are optional services product teams can adopt; the gateway is mandatory.

Layer 5 — Developer Portal is the human-facing surface: API catalogue, self-service onboarding, per-team dashboards, AI playgrounds, policy transparency, and documentation. This layer drives adoption and reduces platform team support burden. The portal is built as a product—it has a roadmap, user research input, and a feedback loop with consuming teams.

Cross-cutting Shared Services underpin all layers: Identity and Access Management, Secrets Management, Observability (metrics/logs/traces), Policy Engine (OPA or equivalent), Cost Management and Chargeback, and Data Classification Service. These services are not AI-specific—they extend the existing enterprise platform—but they must be explicitly wired into the AI platform's control plane.

The Platform Team vs. Product Team operating model is critical. The platform team owns Layers 1–3 and the shared services. Product teams own their applications, their prompts, and their AI feature logic. Layer 4 services are joint-owned with a platform team as service provider and product teams as co-designers. The golden path concept operationalises this: the platform team publishes opinionated starter templates, SDKs, and runbooks that encode best practice so product teams can onboard a new AI capability in hours rather than weeks.


5. Architecture Diagram

ARCHITECTURE DIAGRAM
flowchart TD subgraph Consumers["Product Teams"] A[Applications + Pipelines] B[Developer Portal] end subgraph Platform["Platform Layers"] C[AI API Gateway] D[Model Registry] E[Developer Services] end subgraph Infra["Infrastructure + Compute"] F[Self-Hosted GPU] G[Cloud AI Endpoints] end A --> C B -.->|onboard| A C --> D C --> E D --> F D --> G C --> H[(Audit + Cost Store)] style A fill:#dbeafe,stroke:#3b82f6 style B fill:#dbeafe,stroke:#3b82f6 style C fill:#f0fdf4,stroke:#22c55e style D fill:#fef9c3,stroke:#eab308 style E fill:#f0fdf4,stroke:#22c55e style F fill:#fef9c3,stroke:#eab308 style G fill:#fef9c3,stroke:#eab308 style H fill:#fef9c3,stroke:#eab308

6. Components

Layer 1 — Infrastructure and Compute

Component Type Responsibility Technology Options Criticality
GPU / Accelerator Cluster Infrastructure Self-hosted model inference compute AWS EC2 P4/P5, Azure NDv4, GCP A3, on-prem NVIDIA DGX High
Cloud AI Endpoints Managed Service Access to frontier models with SLA AWS Bedrock, Azure OpenAI, GCP Vertex AI, Anthropic API Critical
VPC / Network Controls Infrastructure Data residency, private connectivity, egress control AWS VPC + PrivateLink, Azure VNet + Private Endpoint, GCP VPC-SC Critical
Data Residency Enforcer Policy Block requests violating data sovereignty rules Custom middleware, OPA, Cloudflare Zero Trust High

Layer 2 — Model Serving and Registry

Component Type Responsibility Technology Options Criticality
Model Registry Service Catalogue of approved models with metadata, capability cards, risk ratings MLflow, Hugging Face Hub (private), custom DB High
OpenAI-Compatible Inference Service Standardised API surface for self-hosted models vLLM, TGI (Hugging Face), NVIDIA Triton, BentoML High
Cloud Provider Proxy Service Unified endpoint abstracting cloud provider differences LiteLLM, custom proxy, Kong AI Gateway High
Model Lifecycle Manager Service Versioning, deprecation, rollout orchestration Custom, Argo Rollouts, Spinnaker Medium

Layer 3 — AI API Gateway

Component Type Responsibility Technology Options Criticality
AI API Gateway Service Authn, authz, rate limiting, routing, logging, caching Kong AI Gateway, AWS API Gateway + Lambda, Azure APIM, Apigee, LiteLLM Proxy Critical
Rate Limiter Policy Token-based and request-based rate limits per consumer/team Redis + sliding window, Kong rate-limit-advanced Critical
Semantic Cache Service Cache near-identical prompt responses to reduce cost/latency Redis + vector index, GPTCache, Momento High
Audit Logger Service Immutable record of all requests and responses Kinesis → S3, Kafka → object store, OpenTelemetry → SIEM Critical
Circuit Breaker Reliability Prevent cascade failure when model endpoints degrade Resilience4j, custom middleware, Envoy High

Layer 4 — Developer Services

Component Type Responsibility Technology Options Criticality
Prompt Registry Service Version-controlled prompt store with promotion workflow Custom Git-backed store, LangSmith, Promptflow High
Evaluation Framework Service Automated benchmarking of model/prompt combinations Ragas, DeepEval, custom harness Medium
Experimentation Service Service A/B and shadow routing for model comparison Custom feature-flag backed, LaunchDarkly + gateway Medium
RAG Orchestration Service Retrieval-augmented generation pipeline management LangChain, LlamaIndex, custom Medium

Layer 5 — Developer Portal

Component Type Responsibility Technology Options Criticality
API Catalogue Portal Discoverable inventory of all AI capabilities Backstage, Apigee Developer Portal, custom High
Self-Service Onboarding Portal Automated provisioning of API keys, rate limits, team namespaces Backstage scaffolder, custom workflow High
Usage Dashboards Portal Per-team cost, request volume, error rate visibility Grafana, Superset, PowerBI Medium
AI Playground Portal Interactive testing environment without production blast radius Custom, Promptflow Studio Medium

7. Data Flow

Primary Flow — Product Team AI Request

Step Actor Action Output
1 Product Team Application Issue HTTPS POST to AI API Gateway with JWT/API key and prompt payload Authenticated request at gateway ingress
2 AI API Gateway — AuthN Validate JWT against IdP or validate API key hash Authenticated identity + team namespace
3 AI API Gateway — AuthZ Check RBAC/ABAC: does this team/identity have access to the requested model? Authorised or 403 rejection
4 AI API Gateway — Classification Data Classification Service inspects prompt for PII, sensitive data, classification level Classification label attached to request context
5 AI API Gateway — Policy OPA evaluates: is this classification allowed for this model endpoint per policy? Policy allow/deny decision
6 AI API Gateway — Rate Limit Check token bucket / sliding window for this consumer Allow or 429 rate limit response
7 AI API Gateway — Semantic Cache Hash prompt embedding; check vector cache for near-match Cache hit (return cached response) or cache miss (continue)
8 AI API Gateway — Cost Tag Attach cost allocation tag (team, project, environment) to request Tagged request context
9 AI API Gateway — Audit Pre-Log Write request record (prompt hash, metadata, timestamp) to audit log Immutable pre-request audit record
10 Model Router Select optimal model endpoint based on routing rules (capability, cost, latency) Upstream target selected
11 Model Serving Layer Forward request to cloud provider API or self-hosted inference endpoint Raw model response
12 AI API Gateway — Response Return response to caller; emit token usage to cost management Response to product team + cost event
13 AI API Gateway — Audit Post-Log Write response record (response hash, token counts, latency) to audit log Immutable post-response audit record

Error Flow

Error Condition Detection Point Action Consumer Experience
Model endpoint unavailable Circuit breaker (Layer 3) Open circuit; route to fallback model or return 503 with Retry-After Graceful degradation or explicit error
Policy denial (data classification) Policy engine (Layer 3) Reject request; log policy violation event 403 with policy violation code
Rate limit exceeded Rate limiter (Layer 3) Reject with 429; include Retry-After header Explicit rate limit response
Prompt injection detected Guardrails layer Reject or sanitise; raise security alert 400 Bad Request or sanitised response
Model returns error (5xx) Gateway upstream handler Retry with exponential backoff; failover if retries exhausted Transparent retry then degraded fallback

8. Security Considerations

Authentication and Authorisation

  • All consumers authenticate via short-lived OIDC JWT tokens or rotatable API keys stored in Secrets Manager; long-lived static credentials are prohibited
  • RBAC model: model-viewer, model-invoker, prompt-editor, platform-admin; ABAC extends this with data classification attributes
  • Service-to-service communication within the platform uses mTLS with certificates managed by the service mesh (Istio/Linkerd)

Secrets Management

  • All model provider API keys (OpenAI, Anthropic, AWS Bedrock IAM roles) are stored in HashiCorp Vault or cloud-native secrets manager; zero hardcoded credentials
  • Secrets rotation is automated; gateway refreshes credentials on a schedule without downtime
  • Audit log of every secret access event

Data Classification and Encryption

  • All prompts and responses classified at ingress by the Data Classification Service; classification label persists through the audit trail
  • Data at rest: AES-256 encryption for audit log store, vector cache, and model registry
  • Data in transit: TLS 1.3 minimum for all internal and external communication
  • PII in prompts: masked or tokenised before sending to third-party cloud endpoints if data residency policy requires

Auditability

  • Cryptographic hash of every prompt and response stored in the audit log; enables non-repudiation
  • Audit log is append-only and stored in a separate security account with no delete permissions for platform operators
  • Audit events emitted to SIEM (Splunk/Sentinel/Chronicle) in real time

OWASP LLM Top 10 Controls

OWASP LLM Risk Control Implemented in Platform
LLM01 Prompt Injection Input guardrails at gateway layer; prompt injection classifier as policy check
LLM02 Insecure Output Handling Response sanitisation middleware; output schema validation for structured outputs
LLM03 Training Data Poisoning Model Registry approvals gate; only approved model versions from trusted registries
LLM04 Model Denial of Service Rate limiting per consumer; token budget enforcement; circuit breaker
LLM05 Supply Chain Vulnerabilities Model provenance tracking in Registry; SBoM for self-hosted models; vendor attestation
LLM06 Sensitive Information Disclosure Data classification at ingress; PII masking before third-party routing; audit logging
LLM07 Insecure Plugin Design API scoping for AI-initiated actions; OAuth2 scopes on all downstream APIs called by agents
LLM08 Excessive Agency Human-in-the-loop gates for agentic actions; action whitelist in policy engine
LLM09 Overreliance Confidence thresholds; output labelling as AI-generated; mandatory human review for critical decisions
LLM10 Model Theft Self-hosted model weights encrypted at rest; access logs for model artifact downloads; network egress controls

9. Governance Considerations

Responsible AI Framework

  • Every model onboarded to the registry must have a completed Model Risk Card covering intended use, limitations, bias evaluation results, and regulatory classification
  • High-risk AI use cases (as defined by EU AI Act Annex III or organisational risk policy) require additional approval and enhanced monitoring
  • Data used for model fine-tuning must go through the Data Ethics Review process

Model Risk Management

  • Models are classified by risk tier: Low (content summarisation), Medium (customer-facing recommendations), High (automated decisions affecting individuals)
  • High-tier models require a signed-off Model Risk Assessment before production promotion
  • Ongoing model monitoring for performance drift, bias drift, and output quality degradation

Human Approval Gates

  • Changes to platform-wide policies (rate limits, model access tiers, data classification rules) require approval from the AI Platform Governance Board
  • High-risk model promotions to production require Platform Owner + Chief Risk Officer sign-off
  • Agentic use cases that can initiate real-world actions (send emails, execute transactions) require explicit human-in-the-loop gate design

Policy and Traceability

Governance Artefact Owner Cadence Storage Location
Model Risk Card Model Owner + Risk Team Per model version Model Registry
Data Classification Policy Data Governance Team Annual review Policy Engine configuration
API Usage Policy Platform Team Quarterly review Developer Portal
Audit Log Retention Policy Legal / Compliance Annual review Platform Runbook
AI Incident Register CISO + Platform Team Per incident GRC system
Platform Governance Board Minutes Platform Owner Monthly Confluence / SharePoint
Cost Allocation Report FinOps / Platform Team Monthly Finance system

10. Operational Considerations

Monitoring

Signal Source Alert Threshold Owner
Gateway error rate API Gateway metrics >1% 5xx over 5 min Platform Team
Model endpoint latency P99 Tracing >5s for interactive, >30s for batch Platform Team
Circuit breaker state Circuit breaker events Any circuit opening Platform Team + Model Owner
Cost anomaly Cost management service >20% day-over-day spend increase FinOps + Platform Team
Audit log ingestion lag Log pipeline metrics >60s lag Platform Team + Security
Cache hit rate Semantic cache metrics <20% hit rate sustained 1h (signals cache misconfiguration) Platform Team

SLOs

SLO Target Measurement Window
Gateway availability 99.9% Rolling 30 days
Interactive request P95 latency (excluding model inference) <100ms Rolling 7 days
Audit log completeness 100% of requests logged Rolling 24 hours
Policy enforcement correctness Zero bypass incidents Rolling 90 days
Self-service onboarding success rate >95% of new team onboards complete without platform team intervention Monthly

Logging

  • Structured JSON logs emitted by all platform components; correlated by x-request-id and x-team-id headers
  • Log levels: INFO for all gateway transactions, WARN for policy near-misses, ERROR for circuit openings and auth failures
  • Security-sensitive events (policy violations, auth failures) shipped to SIEM within 60 seconds
  • Log retention: 90 days hot (searchable), 7 years cold (compliance archive)

Incident Response

Incident Type Detection Response RTO
Complete gateway outage Synthetic probes + error rate alert Failover to secondary region; page platform on-call 5 minutes
Model provider outage Circuit breaker + health check Switch to fallback model; notify consuming teams 10 minutes
Security breach (prompt data leak) SIEM alert Isolate affected namespace; revoke credentials; notify CISO 15 minutes
Cost runaway Cost anomaly alert Rate limit enforcement tightened; notify FinOps + team lead 30 minutes

Disaster Recovery

Component RPO RTO Strategy
AI API Gateway 0 (stateless) 2 min Multi-AZ active-active; DNS failover
Audit Log Store <1 min 15 min Cross-region replication; immutable S3 buckets
Model Registry 5 min 30 min Database replication; Git-backed as secondary
Semantic Cache 1 hour 5 min Cache is soft state; rebuild from model calls; acceptable cold-start
Prompt Registry 0 10 min Git-backed; replicated; restore from tag

11. Cost Considerations

Cost Drivers

Driver Description Typical % of Total
Model inference (cloud APIs) Token charges for GPT-4, Claude, Gemini calls 60–75%
GPU compute (self-hosted) On-demand or reserved GPU instances for self-hosted models 10–20%
Semantic cache Vector store hosting + Redis cache tier 3–8%
Observability infrastructure Log storage, metrics, tracing at platform scale 5–10%
Developer portal hosting Always-on service, relatively low cost 1–3%
Platform team labour Engineering + operations headcount Excluded (CapEx/OpEx accounting)

Scaling Risks

  • Token cost scales super-linearly with context window abuse: no-context-limit requests from one team can dominate platform spend
  • Uncontrolled model tier usage: teams defaulting to most expensive model for every use case without routing intelligence
  • Cache cold-start: new deployment or cache eviction causes temporary cost spike as cache warms

Optimisations

  • Semantic caching: 20–40% token reduction on repetitive workloads (FAQ, summarisation)
  • Model tier routing: route simple tasks to cheaper models (GPT-4o-mini, Claude Haiku); reserve frontier for complex reasoning
  • Prompt compression: strip whitespace, compress system prompts via shared library; 10–15% token reduction
  • Batch API for non-interactive: use provider batch APIs at 50% discount for overnight processing
  • Reserved capacity: negotiate reserved throughput with cloud AI providers for predictable workloads

Indicative Cost Range

Scale Monthly AI Platform Infra Cost Notes
Small (1–5 teams, <1M tokens/day) $3,000–$12,000 Mostly cloud API costs; minimal self-hosted
Medium (5–20 teams, 1–10M tokens/day) $15,000–$80,000 Mix of cloud API + some self-hosted; semantic cache delivers ROI
Large (20+ teams, >10M tokens/day) $80,000–$400,000+ Self-hosted frontier models become cost-competitive; FinOps team warranted

12. Trade-Off Analysis

Compute Architecture Options

Option Description Pros Cons Best For
Cloud-Only (Managed APIs) All inference via cloud provider managed APIs (Bedrock, Azure OpenAI, Vertex) Zero infrastructure ops; rapid access to frontier models; SLA-backed Data residency constraints; vendor lock-in; highest per-token cost at scale Organisations <$50K/month AI spend; strict no-GPU-ops mandate
Hybrid (Cloud + Self-Hosted) Cloud APIs for frontier models; self-hosted open-weight models for high-volume/lower-complexity Cost optimisation; data residency for sensitive workloads; model diversity GPU ops expertise required; model update operational burden Most enterprises at medium-large scale
Self-Hosted First Maximise self-hosted; cloud only for capabilities not replicable Maximum data control; no per-token cost; customisable High infrastructure investment; frontier model gap; GPU scarcity; ops complexity Air-gapped environments; sovereign AI requirements

Tenant Isolation Options

Option Description Pros Cons Best For
Shared Pool All tenants share gateway + inference endpoints; namespace isolation in software Lowest cost; highest utilisation Noisy neighbour risk; complex policy enforcement Internal enterprise teams with trust relationships
Dedicated Namespace Separate gateway instances per tenant; shared compute Balance of isolation and cost More infrastructure complexity External-facing B2B platforms
Dedicated Compute Separate inference endpoints per tenant Strongest isolation; predictable performance Highest cost; most ops overhead Regulated industries with data-separation requirements

Architectural Tensions

Tension Option A Option B Resolution
Developer autonomy vs. governance control Teams choose any model freely Platform mandates approved model list Approved model list with fast-track review process for new models
Cost optimisation vs. performance Route to cheapest model always Route to best model always Routing rules based on use-case classification; teams declare use case
Openness of audit logs vs. privacy Full prompt/response logging No logging of content Log metadata and hashes; content only on explicit high-risk classification
Platform team velocity vs. consumer customisation Platform publishes fixed golden paths Teams fully self-serve Golden paths as starting templates; teams can fork within policy guardrails

13. Failure Modes

Failure Likelihood Impact Detection Recovery
AI API Gateway complete outage Low Critical — all AI features unavailable Synthetic probes, zero traffic alert Multi-AZ failover; circuit breaker routes to fallback
Cloud model provider outage (e.g., OpenAI 5xx) Medium High — affects all consumers of that provider Circuit breaker opens; error rate spike Failover to alternate provider or self-hosted model
Semantic cache poisoning (incorrect cached response served) Low High — incorrect responses served silently Response quality monitoring; user feedback Cache flush; cache validation before reintroduction
Token budget exhaustion for a team High Medium — team's AI features degrade gracefully Cost management alert; 429 from gateway Increase quota with approval; implement back-pressure in consuming app
Data classification false negative (sensitive data reaches wrong model) Low Critical — data residency or privacy breach Retrospective audit log scan; SIEM alert Incident response; vendor notification if required; root cause fix to classifier
Prompt registry unavailable Medium Medium — teams cannot load latest prompts Health check failure; latency spike Fall back to last-known-good prompt version cached in gateway
Model Registry corruption Low High — wrong model versions deployed Registry integrity check on startup Restore from Git-backed backup; re-validate model versions

Cascading Failure Scenarios

  • Semantic cache failure → cold-start cost spike: Cache failure causes all requests to hit model directly; combined with a traffic spike this can exhaust token budgets across multiple teams simultaneously and trigger cloud provider rate limits. Mitigation: circuit breaker on cache with graceful bypass; pre-emptive capacity buffer in token budgets.
  • Policy engine outage → open or closed failure: If OPA becomes unavailable, the gateway must fail open (allow all, risk policy bypass) or fail closed (deny all, block all AI features). This is a critical design choice; most enterprises should fail closed with a break-glass procedure.
  • Identity provider outage → complete gateway authentication failure: If the IdP issuing JWTs is unavailable, all JWT-authenticated requests fail. Mitigation: API key fallback path for critical production consumers; IdP HA configuration.

14. Regulatory Considerations

APRA CPS 230 (Operational Risk)

  • The platform must be classified as a Critical or Important Business Service if AI features are material to regulated activities; this triggers BCP/DR obligations including RTO/RPO targets above
  • Third-party model providers (OpenAI, Anthropic) must be assessed under CPS 230 third-party risk management obligations; contracts must include sub-contracting visibility, audit rights, and incident notification requirements
  • Operational incidents affecting AI services must be reportable to APRA if material

APRA CPS 234 (Information Security)

  • The audit log is an information asset requiring classification, protection, and retention per CPS 234
  • All platform components handling sensitive data must be within the CPS 234 information security capability boundary
  • Penetration testing of the AI API Gateway is required at least annually and after significant changes

Privacy Act 1988 (Australia) / GDPR (EU)

  • Personal information in prompts and responses must be handled in accordance with the Privacy Act; prompt logging of PII-containing interactions requires a Privacy Impact Assessment
  • Data minimisation principle applies: prompts should not contain more PII than necessary for the AI task
  • Data residency controls must enforce storage of Australian personal information within Australia if required by APP 8 considerations

EU AI Act

  • Article 9 requires risk management systems for high-risk AI applications; the Model Risk Card and platform governance artefacts satisfy this requirement
  • Article 13 transparency obligations require AI-generated content to be identifiable as such in consumer-facing applications
  • Article 17 quality management system requirements are met by the prompt version control, evaluation framework, and change governance processes

ISO 42001 (AI Management System)

  • The platform governance artefacts (Model Risk Cards, audit logs, governance board minutes) constitute the AI management system records required by ISO 42001 Clause 7
  • Continual improvement processes (evaluation framework, post-incident review) satisfy Clause 10

NIST AI RMF

  • GOVERN 1.1: AI risk tolerance defined via model risk tiers and data classification policies
  • MAP 2.1: AI risk context mapped through Model Risk Cards and use case classification
  • MEASURE 2.3: Metrics for AI risk tracked through observability stack and governance dashboards
  • MANAGE 3.1: Response plans for AI incidents documented in platform runbook

15. Reference Implementations

AWS

Component AWS Service
AI API Gateway Amazon API Gateway + AWS Lambda authoriser, or Kong on EKS
Model Serving (cloud) Amazon Bedrock (Claude, Llama, Titan)
Model Serving (self-hosted) Amazon SageMaker Endpoints or EKS + vLLM on P4/P5
Semantic Cache Amazon ElastiCache (Redis) + Amazon OpenSearch for vector index
Audit Log Amazon Kinesis Data Streams → S3 (Glacier for cold) → Athena for query
Policy Engine AWS Lambda + OPA sidecar, or AWS Verified Permissions
Secrets AWS Secrets Manager
Observability Amazon CloudWatch + AWS X-Ray + OpenTelemetry
Developer Portal AWS Service Catalog + Backstage on ECS
Cost Management AWS Cost Explorer + Cost Allocation Tags + AWS Budgets

Azure

Component Azure Service
AI API Gateway Azure API Management (APIM) with AI policies
Model Serving (cloud) Azure OpenAI Service
Model Serving (self-hosted) AKS + vLLM on NC-series
Semantic Cache Azure Cache for Redis + Azure AI Search
Audit Log Azure Event Hubs → Azure Data Lake Gen2
Policy Engine Azure Policy + OPA on AKS
Secrets Azure Key Vault
Observability Azure Monitor + Application Insights
Developer Portal Azure API Management built-in developer portal
Cost Management Azure Cost Management + Tags

GCP

Component GCP Service
AI API Gateway Cloud Endpoints / Apigee
Model Serving (cloud) Vertex AI (Gemini, Claude via Model Garden)
Model Serving (self-hosted) GKE + vLLM on A3
Semantic Cache Memorystore (Redis) + Vertex AI Vector Search
Audit Log Cloud Pub/Sub → BigQuery
Policy Engine Binary Authorization + OPA on GKE
Secrets Secret Manager
Observability Cloud Monitoring + Cloud Trace + OpenTelemetry
Developer Portal Apigee Developer Portal
Cost Management Cloud Billing + Labels + Budget Alerts

On-Premises

Component Technology
AI API Gateway Kong Enterprise or NGINX + custom Lua/Python middleware
Model Serving vLLM or TGI on bare-metal GPU servers (NVIDIA A100/H100)
Semantic Cache Redis Enterprise + Qdrant or Weaviate
Audit Log Apache Kafka → MinIO (S3-compatible)
Policy Engine OPA (open source)
Secrets HashiCorp Vault
Observability Prometheus + Grafana + Tempo + Loki
Developer Portal Backstage (CNCF)
Cost Management Custom chargeback reporting from Kafka cost events

Pattern ID Name Relationship
EAAPL-PLT002 AI API Gateway Child pattern — PLT001 Layer 3 is implemented by PLT002
EAAPL-PLT003 Model Routing Child pattern — model routing is a capability within PLT001 Layer 3
EAAPL-PLT004 LLM Cost Control Specialisation — cost control mechanisms are instantiated within PLT001
EAAPL-PLT005 Prompt Version Control Child pattern — Prompt Registry is Layer 4 of PLT001
EAAPL-PLT006 LLM Caching Layer Child pattern — Semantic Cache is a component of PLT001 Layer 3
EAAPL-PLT007 Multi-Tenant AI Platform Extension — PLT007 elaborates tenant isolation within PLT001
EAAPL-PLT008 AI Experiment Tracking Child pattern — Evaluation Framework is Layer 4 of PLT001
EAAPL-PLT010 AI Developer Portal Child pattern — Developer Portal is Layer 5 of PLT001
EAAPL-INT001 Enterprise AI Service Bus Complementary — event bus integrates with PLT001 for async AI workflows
EAAPL-GOV001 AI Governance Framework Dependency — PLT001 is the enforcement vehicle for governance policies

17. Maturity Assessment

Overall Maturity: Mature This pattern is in production at multiple large enterprises across financial services, healthcare, and technology verticals. Reference implementations are available for all major cloud providers. Tooling ecosystem (Kong, LiteLLM, Backstage, vLLM) is stable and production-proven.

Scoring Matrix

Dimension Score (1–5) Rationale
Pattern Completeness 5 All 18 sections documented; no gaps
Implementation Evidence 5 Production deployments at Fortune 500 scale documented
Tooling Ecosystem Stability 4 Core tools stable; AI-specific gateway features still evolving rapidly
Regulatory Alignment 5 Explicitly mapped to APRA, EU AI Act, ISO 42001, NIST AI RMF
Operational Complexity Medium Requires dedicated platform team; not suitable for single-team orgs
Cost Efficiency at Scale High Proven 30–50% cost reduction vs. unmanaged direct API access
Time to First Value Medium 6–12 weeks to MVP platform; full capability 6–12 months

18. Revision History

Version Date Author Changes
1.0 2024-01-15 EAAPL Working Group Initial pattern publication
1.1 2024-04-20 EAAPL Working Group Added semantic caching component; expanded cost model
1.2 2024-08-10 EAAPL Working Group EU AI Act Article 9/13/17 alignment; updated OWASP LLM Top 10 to 2024 edition
1.3 2025-01-08 EAAPL Working Group Added agentic use case governance; updated reference implementations for Bedrock/Vertex
1.4 2025-06-12 EAAPL Working Group Multi-tenant isolation options expanded; DR table updated; cost ranges recalibrated
← Back to LibraryMore Platform Engineering