EAAPL-PLT001Proven

12 signals↑

Enterprise AI Platform

Platform EngineeringAPRA CPS230EU AI Act

[EAAPL-PLT001] Enterprise AI Platform

Category: Platform Engineering Sub-category: Foundation Platform Version: 1.4 Maturity: Mature Tags: platform-engineering, internal-developer-platform, golden-path, shared-services, model-serving, developer-experience Regulatory Relevance: APRA CPS230, CPS234, EU AI Act (Article 9 Risk Management), ISO 42001, NIST AI RMF (GOVERN 1.1)

1. Executive Summary

The Enterprise AI Platform pattern establishes a shared, governed infrastructure layer that enables product teams to consume AI capabilities safely and efficiently without each team solving foundational concerns independently. Rather than allowing every business unit to procure models, build integrations, and manage compliance in isolation—creating exponential risk surface and duplicated cost—this pattern centralises platform concerns while preserving product team autonomy.

The platform delivers measurable outcomes: 60–80% reduction in time-to-first-AI-feature for new teams, consolidated cost visibility with per-team chargeback, a single control plane for policy enforcement (data classification, model access tiers, rate limits), and an audit trail satisfying regulatory obligations across all AI usage. The platform team operates as an internal product team serving engineering consumers, not a gatekeeping function. Adoption is driven through golden paths—opinionated, well-documented routes to common AI use cases—that make the right thing the easy thing. This pattern is the prerequisite upon which all other EAAPL platform patterns depend.

2. Problem Statement

Business Problem

Enterprises face uncoordinated AI adoption: each team independently evaluates models, negotiates vendor contracts, builds bespoke integrations, and manages compliance obligations. This creates duplicated investment, inconsistent risk posture, and no executive visibility into total AI spend or exposure. AI incidents (data leakage, hallucination in customer-facing output, cost overruns) are discovered reactively with no systematic controls.

Technical Problem

Without a shared platform, teams build thin wrappers around foundation model APIs, each implementing authentication, logging, error handling, and cost tracking differently. There is no consistent mechanism for prompt versioning, model failover, semantic caching, or response auditing. Security review is performed ad hoc. Infrastructure drift compounds over time.

Symptoms

Multiple AWS/Azure/GCP AI accounts with no consolidated billing or spend alerts
Product engineers spending >30% of AI feature development time on infrastructure concerns
Security team performing point-in-time reviews rather than continuous enforcement
No audit trail mapping AI outputs to the model version and prompt that produced them
Data residency violations discovered post-deployment as teams use public endpoints without restriction
Duplicate vendor contracts for the same model provider across business units

Cost of Inaction

Regulatory non-compliance penalties (APRA operational risk, EU AI Act fines up to 3% global turnover)
AI security incidents with no forensic trail, increasing breach disclosure obligations
Cost inefficiency of 30–50% above market rate due to absence of volume commitments and caching
6–12 month delays in AI capability delivery as teams rebuild foundational patterns from scratch

3. Context

When to Apply

Organisation has ≥3 product teams independently consuming or planning to consume AI services
Enterprise has data classification requirements that must be enforced before prompts leave the perimeter
AI spend is untracked or exceeds $50K/year across business units without consolidated visibility
Regulatory obligations (APRA, EU AI Act, privacy legislation) require audit trails for AI-assisted decisions
Platform or infrastructure team exists with mandate to provide shared engineering services

When NOT to Apply

Single-product startup with one team: overhead of platform exceeds benefit; use a managed API gateway directly
Proof-of-concept or time-boxed experiment: build direct integrations, migrate to platform post-validation
Fully air-gapped deployment with no shared infrastructure capability: consider a simplified on-premises variant

Prerequisites

Identity provider (IdP) capable of issuing service account credentials (OIDC/OAuth2)
Centralised secrets management (HashiCorp Vault, AWS Secrets Manager, Azure Key Vault)
Observability stack (metrics, logs, traces) available for platform instrumentation
Executive sponsorship and cross-BU agreement on platform adoption mandate (voluntary adoption rarely scales past early adopters)
Cloud landing zone or on-premises infrastructure with network segmentation capability

Industry Applicability

Industry	Applicability	Primary Driver
Financial Services (Banking, Insurance)	Very High	APRA CPS230/234, data residency, audit trails for AI-assisted decisions
Healthcare	Very High	Patient data privacy, clinical AI regulatory approval, audit requirements
Government	High	Data sovereignty, security classification, procurement rules
Retail / E-commerce	High	Cost at scale, multi-team coordination, personalisation pipelines
Media & Entertainment	Medium	Cost efficiency, content moderation, creator tools
Technology / SaaS	Medium-High	Developer productivity, model diversification, cost optimisation

4. Architecture Overview

The Enterprise AI Platform is structured as five horizontal layers stacked atop shared cross-cutting services. Each layer has a clear ownership boundary and a defined interface contract. The deliberate separation of concerns between layers is what allows the platform to evolve (e.g., swapping model providers, adding new compute tiers) without disrupting product teams.

Layer 1 — Infrastructure and Compute provides the physical and virtual compute substrate: GPU/accelerator clusters for self-hosted model serving, cloud provider AI endpoints (Amazon Bedrock, Azure OpenAI, Google Vertex AI), and VPC/network controls enforcing data residency. This layer is owned by the Platform Infrastructure team and changes infrequently. The critical design decision here is whether to use a shared GPU pool, dedicated per-tenant compute, or a hybrid—this choice has profound cost and isolation implications addressed in the Trade-Off Analysis.

Layer 2 — Model Serving and Registry abstracts individual model deployment concerns. It hosts the Model Registry (model metadata, capability cards, approved versions, deprecation notices), the Serving Layer (OpenAI-compatible inference endpoints whether models are self-hosted via vLLM/TorchServe or proxied from cloud providers), and the Model Lifecycle Manager. The OpenAI-compatible API surface is a deliberate choice: it maximises ecosystem compatibility and allows product teams to switch underlying models with zero code change.

Layer 3 — AI API Gateway is the primary integration point for all platform consumers. It enforces authentication (API keys/OIDC JWT), authorisation (RBAC/ABAC on model and capability access), rate limiting per consumer/team, cost allocation tagging, prompt/response logging for audit, semantic caching, and circuit breaking. Every request transits this layer—there are no side-door paths to models. This is the enforcement perimeter for all security and governance controls.

Layer 4 — Developer Services includes the capabilities that accelerate product team velocity: the Prompt Registry (versioned prompts with promotion workflows), the Evaluation Framework (automated benchmarking against golden datasets), the Experimentation Service (A/B routing for model comparison), and the RAG Orchestration Service. These are optional services product teams can adopt; the gateway is mandatory.

Layer 5 — Developer Portal is the human-facing surface: API catalogue, self-service onboarding, per-team dashboards, AI playgrounds, policy transparency, and documentation. This layer drives adoption and reduces platform team support burden. The portal is built as a product—it has a roadmap, user research input, and a feedback loop with consuming teams.

Cross-cutting Shared Services underpin all layers: Identity and Access Management, Secrets Management, Observability (metrics/logs/traces), Policy Engine (OPA or equivalent), Cost Management and Chargeback, and Data Classification Service. These services are not AI-specific—they extend the existing enterprise platform—but they must be explicitly wired into the AI platform's control plane.

The Platform Team vs. Product Team operating model is critical. The platform team owns Layers 1–3 and the shared services. Product teams own their applications, their prompts, and their AI feature logic. Layer 4 services are joint-owned with a platform team as service provider and product teams as co-designers. The golden path concept operationalises this: the platform team publishes opinionated starter templates, SDKs, and runbooks that encode best practice so product teams can onboard a new AI capability in hours rather than weeks.

5. Architecture Diagram

ARCHITECTURE DIAGRAM

flowchart TD subgraph Consumers["Product Teams"] A[Applications + Pipelines] B[Developer Portal] end subgraph Platform["Platform Layers"] C[AI API Gateway] D[Model Registry] E[Developer Services] end subgraph Infra["Infrastructure + Compute"] F[Self-Hosted GPU] G[Cloud AI Endpoints] end A --> C B -.->|onboard| A C --> D C --> E D --> F D --> G C --> H[(Audit + Cost Store)] style A fill:#dbeafe,stroke:#3b82f6 style B fill:#dbeafe,stroke:#3b82f6 style C fill:#f0fdf4,stroke:#22c55e style D fill:#fef9c3,stroke:#eab308 style E fill:#f0fdf4,stroke:#22c55e style F fill:#fef9c3,stroke:#eab308 style G fill:#fef9c3,stroke:#eab308 style H fill:#fef9c3,stroke:#eab308

6. Components

Layer 1 — Infrastructure and Compute

Component	Type	Responsibility	Technology Options	Criticality
GPU / Accelerator Cluster	Infrastructure	Self-hosted model inference compute	AWS EC2 P4/P5, Azure NDv4, GCP A3, on-prem NVIDIA DGX	High
Cloud AI Endpoints	Managed Service	Access to frontier models with SLA	AWS Bedrock, Azure OpenAI, GCP Vertex AI, Anthropic API	Critical
VPC / Network Controls	Infrastructure	Data residency, private connectivity, egress control	AWS VPC + PrivateLink, Azure VNet + Private Endpoint, GCP VPC-SC	Critical
Data Residency Enforcer	Policy	Block requests violating data sovereignty rules	Custom middleware, OPA, Cloudflare Zero Trust	High

Layer 2 — Model Serving and Registry

Component	Type	Responsibility	Technology Options	Criticality
Model Registry	Service	Catalogue of approved models with metadata, capability cards, risk ratings	MLflow, Hugging Face Hub (private), custom DB	High
OpenAI-Compatible Inference	Service	Standardised API surface for self-hosted models	vLLM, TGI (Hugging Face), NVIDIA Triton, BentoML	High
Cloud Provider Proxy	Service	Unified endpoint abstracting cloud provider differences	LiteLLM, custom proxy, Kong AI Gateway	High
Model Lifecycle Manager	Service	Versioning, deprecation, rollout orchestration	Custom, Argo Rollouts, Spinnaker	Medium

Layer 3 — AI API Gateway

Component	Type	Responsibility	Technology Options	Criticality
AI API Gateway	Service	Authn, authz, rate limiting, routing, logging, caching	Kong AI Gateway, AWS API Gateway + Lambda, Azure APIM, Apigee, LiteLLM Proxy	Critical
Rate Limiter	Policy	Token-based and request-based rate limits per consumer/team	Redis + sliding window, Kong rate-limit-advanced	Critical
Semantic Cache	Service	Cache near-identical prompt responses to reduce cost/latency	Redis + vector index, GPTCache, Momento	High
Audit Logger	Service	Immutable record of all requests and responses	Kinesis → S3, Kafka → object store, OpenTelemetry → SIEM	Critical
Circuit Breaker	Reliability	Prevent cascade failure when model endpoints degrade	Resilience4j, custom middleware, Envoy	High

Layer 4 — Developer Services

Component	Type	Responsibility	Technology Options	Criticality
Prompt Registry	Service	Version-controlled prompt store with promotion workflow	Custom Git-backed store, LangSmith, Promptflow	High
Evaluation Framework	Service	Automated benchmarking of model/prompt combinations	Ragas, DeepEval, custom harness	Medium
Experimentation Service	Service	A/B and shadow routing for model comparison	Custom feature-flag backed, LaunchDarkly + gateway	Medium
RAG Orchestration	Service	Retrieval-augmented generation pipeline management	LangChain, LlamaIndex, custom	Medium

Layer 5 — Developer Portal

Component	Type	Responsibility	Technology Options	Criticality
API Catalogue	Portal	Discoverable inventory of all AI capabilities	Backstage, Apigee Developer Portal, custom	High
Self-Service Onboarding	Portal	Automated provisioning of API keys, rate limits, team namespaces	Backstage scaffolder, custom workflow	High
Usage Dashboards	Portal	Per-team cost, request volume, error rate visibility	Grafana, Superset, PowerBI	Medium
AI Playground	Portal	Interactive testing environment without production blast radius	Custom, Promptflow Studio	Medium

7. Data Flow

Primary Flow — Product Team AI Request

Step	Actor	Action	Output
1	Product Team Application	Issue HTTPS POST to AI API Gateway with JWT/API key and prompt payload	Authenticated request at gateway ingress
2	AI API Gateway — AuthN	Validate JWT against IdP or validate API key hash	Authenticated identity + team namespace
3	AI API Gateway — AuthZ	Check RBAC/ABAC: does this team/identity have access to the requested model?	Authorised or 403 rejection
4	AI API Gateway — Classification	Data Classification Service inspects prompt for PII, sensitive data, classification level	Classification label attached to request context
5	AI API Gateway — Policy	OPA evaluates: is this classification allowed for this model endpoint per policy?	Policy allow/deny decision
6	AI API Gateway — Rate Limit	Check token bucket / sliding window for this consumer	Allow or 429 rate limit response
7	AI API Gateway — Semantic Cache	Hash prompt embedding; check vector cache for near-match	Cache hit (return cached response) or cache miss (continue)
8	AI API Gateway — Cost Tag	Attach cost allocation tag (team, project, environment) to request	Tagged request context
9	AI API Gateway — Audit Pre-Log	Write request record (prompt hash, metadata, timestamp) to audit log	Immutable pre-request audit record
10	Model Router	Select optimal model endpoint based on routing rules (capability, cost, latency)	Upstream target selected
11	Model Serving Layer	Forward request to cloud provider API or self-hosted inference endpoint	Raw model response
12	AI API Gateway — Response	Return response to caller; emit token usage to cost management	Response to product team + cost event
13	AI API Gateway — Audit Post-Log	Write response record (response hash, token counts, latency) to audit log	Immutable post-response audit record

Error Flow

Error Condition	Detection Point	Action	Consumer Experience
Model endpoint unavailable	Circuit breaker (Layer 3)	Open circuit; route to fallback model or return 503 with Retry-After	Graceful degradation or explicit error
Policy denial (data classification)	Policy engine (Layer 3)	Reject request; log policy violation event	403 with policy violation code
Rate limit exceeded	Rate limiter (Layer 3)	Reject with 429; include Retry-After header	Explicit rate limit response
Prompt injection detected	Guardrails layer	Reject or sanitise; raise security alert	400 Bad Request or sanitised response
Model returns error (5xx)	Gateway upstream handler	Retry with exponential backoff; failover if retries exhausted	Transparent retry then degraded fallback

8. Security Considerations

Authentication and Authorisation

All consumers authenticate via short-lived OIDC JWT tokens or rotatable API keys stored in Secrets Manager; long-lived static credentials are prohibited
RBAC model: model-viewer, model-invoker, prompt-editor, platform-admin; ABAC extends this with data classification attributes
Service-to-service communication within the platform uses mTLS with certificates managed by the service mesh (Istio/Linkerd)

Secrets Management

All model provider API keys (OpenAI, Anthropic, AWS Bedrock IAM roles) are stored in HashiCorp Vault or cloud-native secrets manager; zero hardcoded credentials
Secrets rotation is automated; gateway refreshes credentials on a schedule without downtime
Audit log of every secret access event

Data Classification and Encryption

All prompts and responses classified at ingress by the Data Classification Service; classification label persists through the audit trail
Data at rest: AES-256 encryption for audit log store, vector cache, and model registry
Data in transit: TLS 1.3 minimum for all internal and external communication
PII in prompts: masked or tokenised before sending to third-party cloud endpoints if data residency policy requires

Auditability

Cryptographic hash of every prompt and response stored in the audit log; enables non-repudiation
Audit log is append-only and stored in a separate security account with no delete permissions for platform operators
Audit events emitted to SIEM (Splunk/Sentinel/Chronicle) in real time

OWASP LLM Top 10 Controls

OWASP LLM Risk	Control Implemented in Platform
LLM01 Prompt Injection	Input guardrails at gateway layer; prompt injection classifier as policy check
LLM02 Insecure Output Handling	Response sanitisation middleware; output schema validation for structured outputs
LLM03 Training Data Poisoning	Model Registry approvals gate; only approved model versions from trusted registries
LLM04 Model Denial of Service	Rate limiting per consumer; token budget enforcement; circuit breaker
LLM05 Supply Chain Vulnerabilities	Model provenance tracking in Registry; SBoM for self-hosted models; vendor attestation
LLM06 Sensitive Information Disclosure	Data classification at ingress; PII masking before third-party routing; audit logging
LLM07 Insecure Plugin Design	API scoping for AI-initiated actions; OAuth2 scopes on all downstream APIs called by agents
LLM08 Excessive Agency	Human-in-the-loop gates for agentic actions; action whitelist in policy engine
LLM09 Overreliance	Confidence thresholds; output labelling as AI-generated; mandatory human review for critical decisions
LLM10 Model Theft	Self-hosted model weights encrypted at rest; access logs for model artifact downloads; network egress controls

9. Governance Considerations

Responsible AI Framework

Every model onboarded to the registry must have a completed Model Risk Card covering intended use, limitations, bias evaluation results, and regulatory classification
High-risk AI use cases (as defined by EU AI Act Annex III or organisational risk policy) require additional approval and enhanced monitoring
Data used for model fine-tuning must go through the Data Ethics Review process

Model Risk Management

Models are classified by risk tier: Low (content summarisation), Medium (customer-facing recommendations), High (automated decisions affecting individuals)
High-tier models require a signed-off Model Risk Assessment before production promotion
Ongoing model monitoring for performance drift, bias drift, and output quality degradation

Human Approval Gates

Changes to platform-wide policies (rate limits, model access tiers, data classification rules) require approval from the AI Platform Governance Board
High-risk model promotions to production require Platform Owner + Chief Risk Officer sign-off
Agentic use cases that can initiate real-world actions (send emails, execute transactions) require explicit human-in-the-loop gate design

Policy and Traceability

Governance Artefact	Owner	Cadence	Storage Location
Model Risk Card	Model Owner + Risk Team	Per model version	Model Registry
Data Classification Policy	Data Governance Team	Annual review	Policy Engine configuration
API Usage Policy	Platform Team	Quarterly review	Developer Portal
Audit Log Retention Policy	Legal / Compliance	Annual review	Platform Runbook
AI Incident Register	CISO + Platform Team	Per incident	GRC system
Platform Governance Board Minutes	Platform Owner	Monthly	Confluence / SharePoint
Cost Allocation Report	FinOps / Platform Team	Monthly	Finance system

10. Operational Considerations

Monitoring

Signal	Source	Alert Threshold	Owner
Gateway error rate	API Gateway metrics	>1% 5xx over 5 min	Platform Team
Model endpoint latency P99	Tracing	>5s for interactive, >30s for batch	Platform Team
Circuit breaker state	Circuit breaker events	Any circuit opening	Platform Team + Model Owner
Cost anomaly	Cost management service	>20% day-over-day spend increase	FinOps + Platform Team
Audit log ingestion lag	Log pipeline metrics	>60s lag	Platform Team + Security
Cache hit rate	Semantic cache metrics	<20% hit rate sustained 1h (signals cache misconfiguration)	Platform Team

SLOs

SLO	Target	Measurement Window
Gateway availability	99.9%	Rolling 30 days
Interactive request P95 latency (excluding model inference)	<100ms	Rolling 7 days
Audit log completeness	100% of requests logged	Rolling 24 hours
Policy enforcement correctness	Zero bypass incidents	Rolling 90 days
Self-service onboarding success rate	>95% of new team onboards complete without platform team intervention	Monthly

Logging

Structured JSON logs emitted by all platform components; correlated by x-request-id and x-team-id headers
Log levels: INFO for all gateway transactions, WARN for policy near-misses, ERROR for circuit openings and auth failures
Security-sensitive events (policy violations, auth failures) shipped to SIEM within 60 seconds
Log retention: 90 days hot (searchable), 7 years cold (compliance archive)

Incident Response

Incident Type	Detection	Response	RTO
Complete gateway outage	Synthetic probes + error rate alert	Failover to secondary region; page platform on-call	5 minutes
Model provider outage	Circuit breaker + health check	Switch to fallback model; notify consuming teams	10 minutes
Security breach (prompt data leak)	SIEM alert	Isolate affected namespace; revoke credentials; notify CISO	15 minutes
Cost runaway	Cost anomaly alert	Rate limit enforcement tightened; notify FinOps + team lead	30 minutes

Disaster Recovery

Component	RPO	RTO	Strategy
AI API Gateway	0 (stateless)	2 min	Multi-AZ active-active; DNS failover
Audit Log Store	<1 min	15 min	Cross-region replication; immutable S3 buckets
Model Registry	5 min	30 min	Database replication; Git-backed as secondary
Semantic Cache	1 hour	5 min	Cache is soft state; rebuild from model calls; acceptable cold-start
Prompt Registry	0	10 min	Git-backed; replicated; restore from tag

11. Cost Considerations

Cost Drivers

Driver	Description	Typical % of Total
Model inference (cloud APIs)	Token charges for GPT-4, Claude, Gemini calls	60–75%
GPU compute (self-hosted)	On-demand or reserved GPU instances for self-hosted models	10–20%
Semantic cache	Vector store hosting + Redis cache tier	3–8%
Observability infrastructure	Log storage, metrics, tracing at platform scale	5–10%
Developer portal hosting	Always-on service, relatively low cost	1–3%
Platform team labour	Engineering + operations headcount	Excluded (CapEx/OpEx accounting)

Scaling Risks

Token cost scales super-linearly with context window abuse: no-context-limit requests from one team can dominate platform spend
Uncontrolled model tier usage: teams defaulting to most expensive model for every use case without routing intelligence
Cache cold-start: new deployment or cache eviction causes temporary cost spike as cache warms

Optimisations

Semantic caching: 20–40% token reduction on repetitive workloads (FAQ, summarisation)
Model tier routing: route simple tasks to cheaper models (GPT-4o-mini, Claude Haiku); reserve frontier for complex reasoning
Prompt compression: strip whitespace, compress system prompts via shared library; 10–15% token reduction
Batch API for non-interactive: use provider batch APIs at 50% discount for overnight processing
Reserved capacity: negotiate reserved throughput with cloud AI providers for predictable workloads

Indicative Cost Range

Scale	Monthly AI Platform Infra Cost	Notes
Small (1–5 teams, <1M tokens/day)	$3,000–$12,000	Mostly cloud API costs; minimal self-hosted
Medium (5–20 teams, 1–10M tokens/day)	$15,000–$80,000	Mix of cloud API + some self-hosted; semantic cache delivers ROI
Large (20+ teams, >10M tokens/day)	$80,000–$400,000+	Self-hosted frontier models become cost-competitive; FinOps team warranted

12. Trade-Off Analysis

Compute Architecture Options

Option	Description	Pros	Cons	Best For
Cloud-Only (Managed APIs)	All inference via cloud provider managed APIs (Bedrock, Azure OpenAI, Vertex)	Zero infrastructure ops; rapid access to frontier models; SLA-backed	Data residency constraints; vendor lock-in; highest per-token cost at scale	Organisations <$50K/month AI spend; strict no-GPU-ops mandate
Hybrid (Cloud + Self-Hosted)	Cloud APIs for frontier models; self-hosted open-weight models for high-volume/lower-complexity	Cost optimisation; data residency for sensitive workloads; model diversity	GPU ops expertise required; model update operational burden	Most enterprises at medium-large scale
Self-Hosted First	Maximise self-hosted; cloud only for capabilities not replicable	Maximum data control; no per-token cost; customisable	High infrastructure investment; frontier model gap; GPU scarcity; ops complexity	Air-gapped environments; sovereign AI requirements

Tenant Isolation Options

Option	Description	Pros	Cons	Best For
Shared Pool	All tenants share gateway + inference endpoints; namespace isolation in software	Lowest cost; highest utilisation	Noisy neighbour risk; complex policy enforcement	Internal enterprise teams with trust relationships
Dedicated Namespace	Separate gateway instances per tenant; shared compute	Balance of isolation and cost	More infrastructure complexity	External-facing B2B platforms
Dedicated Compute	Separate inference endpoints per tenant	Strongest isolation; predictable performance	Highest cost; most ops overhead	Regulated industries with data-separation requirements

Architectural Tensions

Tension	Option A	Option B	Resolution
Developer autonomy vs. governance control	Teams choose any model freely	Platform mandates approved model list	Approved model list with fast-track review process for new models
Cost optimisation vs. performance	Route to cheapest model always	Route to best model always	Routing rules based on use-case classification; teams declare use case
Openness of audit logs vs. privacy	Full prompt/response logging	No logging of content	Log metadata and hashes; content only on explicit high-risk classification
Platform team velocity vs. consumer customisation	Platform publishes fixed golden paths	Teams fully self-serve	Golden paths as starting templates; teams can fork within policy guardrails

13. Failure Modes

Failure	Likelihood	Impact	Detection	Recovery
AI API Gateway complete outage	Low	Critical — all AI features unavailable	Synthetic probes, zero traffic alert	Multi-AZ failover; circuit breaker routes to fallback
Cloud model provider outage (e.g., OpenAI 5xx)	Medium	High — affects all consumers of that provider	Circuit breaker opens; error rate spike	Failover to alternate provider or self-hosted model
Semantic cache poisoning (incorrect cached response served)	Low	High — incorrect responses served silently	Response quality monitoring; user feedback	Cache flush; cache validation before reintroduction
Token budget exhaustion for a team	High	Medium — team's AI features degrade gracefully	Cost management alert; 429 from gateway	Increase quota with approval; implement back-pressure in consuming app
Data classification false negative (sensitive data reaches wrong model)	Low	Critical — data residency or privacy breach	Retrospective audit log scan; SIEM alert	Incident response; vendor notification if required; root cause fix to classifier
Prompt registry unavailable	Medium	Medium — teams cannot load latest prompts	Health check failure; latency spike	Fall back to last-known-good prompt version cached in gateway
Model Registry corruption	Low	High — wrong model versions deployed	Registry integrity check on startup	Restore from Git-backed backup; re-validate model versions

Cascading Failure Scenarios

Semantic cache failure → cold-start cost spike: Cache failure causes all requests to hit model directly; combined with a traffic spike this can exhaust token budgets across multiple teams simultaneously and trigger cloud provider rate limits. Mitigation: circuit breaker on cache with graceful bypass; pre-emptive capacity buffer in token budgets.
Policy engine outage → open or closed failure: If OPA becomes unavailable, the gateway must fail open (allow all, risk policy bypass) or fail closed (deny all, block all AI features). This is a critical design choice; most enterprises should fail closed with a break-glass procedure.
Identity provider outage → complete gateway authentication failure: If the IdP issuing JWTs is unavailable, all JWT-authenticated requests fail. Mitigation: API key fallback path for critical production consumers; IdP HA configuration.

14. Regulatory Considerations

APRA CPS 230 (Operational Risk)

The platform must be classified as a Critical or Important Business Service if AI features are material to regulated activities; this triggers BCP/DR obligations including RTO/RPO targets above
Third-party model providers (OpenAI, Anthropic) must be assessed under CPS 230 third-party risk management obligations; contracts must include sub-contracting visibility, audit rights, and incident notification requirements
Operational incidents affecting AI services must be reportable to APRA if material

APRA CPS 234 (Information Security)

The audit log is an information asset requiring classification, protection, and retention per CPS 234
All platform components handling sensitive data must be within the CPS 234 information security capability boundary
Penetration testing of the AI API Gateway is required at least annually and after significant changes

Privacy Act 1988 (Australia) / GDPR (EU)

Personal information in prompts and responses must be handled in accordance with the Privacy Act; prompt logging of PII-containing interactions requires a Privacy Impact Assessment
Data minimisation principle applies: prompts should not contain more PII than necessary for the AI task
Data residency controls must enforce storage of Australian personal information within Australia if required by APP 8 considerations

EU AI Act

Article 9 requires risk management systems for high-risk AI applications; the Model Risk Card and platform governance artefacts satisfy this requirement
Article 13 transparency obligations require AI-generated content to be identifiable as such in consumer-facing applications
Article 17 quality management system requirements are met by the prompt version control, evaluation framework, and change governance processes

ISO 42001 (AI Management System)

The platform governance artefacts (Model Risk Cards, audit logs, governance board minutes) constitute the AI management system records required by ISO 42001 Clause 7
Continual improvement processes (evaluation framework, post-incident review) satisfy Clause 10

NIST AI RMF

GOVERN 1.1: AI risk tolerance defined via model risk tiers and data classification policies
MAP 2.1: AI risk context mapped through Model Risk Cards and use case classification
MEASURE 2.3: Metrics for AI risk tracked through observability stack and governance dashboards
MANAGE 3.1: Response plans for AI incidents documented in platform runbook

15. Reference Implementations

AWS

Component	AWS Service
AI API Gateway	Amazon API Gateway + AWS Lambda authoriser, or Kong on EKS
Model Serving (cloud)	Amazon Bedrock (Claude, Llama, Titan)
Model Serving (self-hosted)	Amazon SageMaker Endpoints or EKS + vLLM on P4/P5
Semantic Cache	Amazon ElastiCache (Redis) + Amazon OpenSearch for vector index
Audit Log	Amazon Kinesis Data Streams → S3 (Glacier for cold) → Athena for query
Policy Engine	AWS Lambda + OPA sidecar, or AWS Verified Permissions
Secrets	AWS Secrets Manager
Observability	Amazon CloudWatch + AWS X-Ray + OpenTelemetry
Developer Portal	AWS Service Catalog + Backstage on ECS
Cost Management	AWS Cost Explorer + Cost Allocation Tags + AWS Budgets

Azure

Component	Azure Service
AI API Gateway	Azure API Management (APIM) with AI policies
Model Serving (cloud)	Azure OpenAI Service
Model Serving (self-hosted)	AKS + vLLM on NC-series
Semantic Cache	Azure Cache for Redis + Azure AI Search
Audit Log	Azure Event Hubs → Azure Data Lake Gen2
Policy Engine	Azure Policy + OPA on AKS
Secrets	Azure Key Vault
Observability	Azure Monitor + Application Insights
Developer Portal	Azure API Management built-in developer portal
Cost Management	Azure Cost Management + Tags

GCP

Component	GCP Service
AI API Gateway	Cloud Endpoints / Apigee
Model Serving (cloud)	Vertex AI (Gemini, Claude via Model Garden)
Model Serving (self-hosted)	GKE + vLLM on A3
Semantic Cache	Memorystore (Redis) + Vertex AI Vector Search
Audit Log	Cloud Pub/Sub → BigQuery
Policy Engine	Binary Authorization + OPA on GKE
Secrets	Secret Manager
Observability	Cloud Monitoring + Cloud Trace + OpenTelemetry
Developer Portal	Apigee Developer Portal
Cost Management	Cloud Billing + Labels + Budget Alerts

On-Premises

Component	Technology
AI API Gateway	Kong Enterprise or NGINX + custom Lua/Python middleware
Model Serving	vLLM or TGI on bare-metal GPU servers (NVIDIA A100/H100)
Semantic Cache	Redis Enterprise + Qdrant or Weaviate
Audit Log	Apache Kafka → MinIO (S3-compatible)
Policy Engine	OPA (open source)
Secrets	HashiCorp Vault
Observability	Prometheus + Grafana + Tempo + Loki
Developer Portal	Backstage (CNCF)
Cost Management	Custom chargeback reporting from Kafka cost events

Pattern ID	Name	Relationship
EAAPL-PLT002	AI API Gateway	Child pattern — PLT001 Layer 3 is implemented by PLT002
EAAPL-PLT003	Model Routing	Child pattern — model routing is a capability within PLT001 Layer 3
EAAPL-PLT004	LLM Cost Control	Specialisation — cost control mechanisms are instantiated within PLT001
EAAPL-PLT005	Prompt Version Control	Child pattern — Prompt Registry is Layer 4 of PLT001
EAAPL-PLT006	LLM Caching Layer	Child pattern — Semantic Cache is a component of PLT001 Layer 3
EAAPL-PLT007	Multi-Tenant AI Platform	Extension — PLT007 elaborates tenant isolation within PLT001
EAAPL-PLT008	AI Experiment Tracking	Child pattern — Evaluation Framework is Layer 4 of PLT001
EAAPL-PLT010	AI Developer Portal	Child pattern — Developer Portal is Layer 5 of PLT001
EAAPL-INT001	Enterprise AI Service Bus	Complementary — event bus integrates with PLT001 for async AI workflows
EAAPL-GOV001	AI Governance Framework	Dependency — PLT001 is the enforcement vehicle for governance policies

17. Maturity Assessment

Overall Maturity: Mature This pattern is in production at multiple large enterprises across financial services, healthcare, and technology verticals. Reference implementations are available for all major cloud providers. Tooling ecosystem (Kong, LiteLLM, Backstage, vLLM) is stable and production-proven.

Scoring Matrix

Dimension	Score (1–5)	Rationale
Pattern Completeness	5	All 18 sections documented; no gaps
Implementation Evidence	5	Production deployments at Fortune 500 scale documented
Tooling Ecosystem Stability	4	Core tools stable; AI-specific gateway features still evolving rapidly
Regulatory Alignment	5	Explicitly mapped to APRA, EU AI Act, ISO 42001, NIST AI RMF
Operational Complexity	Medium	Requires dedicated platform team; not suitable for single-team orgs
Cost Efficiency at Scale	High	Proven 30–50% cost reduction vs. unmanaged direct API access
Time to First Value	Medium	6–12 weeks to MVP platform; full capability 6–12 months

18. Revision History

Version	Date	Author	Changes
1.0	2024-01-15	EAAPL Working Group	Initial pattern publication
1.1	2024-04-20	EAAPL Working Group	Added semantic caching component; expanded cost model
1.2	2024-08-10	EAAPL Working Group	EU AI Act Article 9/13/17 alignment; updated OWASP LLM Top 10 to 2024 edition
1.3	2025-01-08	EAAPL Working Group	Added agentic use case governance; updated reference implementations for Bedrock/Vertex
1.4	2025-06-12	EAAPL Working Group	Multi-tenant isolation options expanded; DR table updated; cost ranges recalibrated

Track this pattern for APRA/ASIC review

← Back to Library More Platform Engineering →