APRA CPS230 AI Compliance
[EAAPL-CMP001] APRA CPS230 AI Compliance
Category: Compliance / APRA Prudential Standards Sub-category: Operational Resilience for AI Systems Version: 2.0 Maturity: Mature Tags: APRA, CPS230, operational-resilience, business-continuity, third-party-AI, scenario-testing, critical-operations Regulatory Relevance: APRA CPS230 (effective 1 July 2025), CPS231, CPS234, SPG 230
1. Executive Summary
APRA Prudential Standard CPS230 Operational Risk Management (effective 1 July 2025) imposes comprehensive operational resilience obligations on Australian authorised deposit-taking institutions (ADIs), general and life insurers, and registrable superannuation entity (RSE) licensees. AI systems that support critical operations are directly in scope: they must be identified, risk-assessed, and maintained within defined tolerance levels for disruption.
This pattern provides the architecture for meeting CPS230 obligations specifically for AI systems—an area where the standard's requirements create novel obligations that generic operational risk management frameworks do not address. Critical differences from standard technology: AI models can fail silently (producing wrong outputs within availability SLOs); AI vendor dependencies may be concentrated (single foundation model provider); AI systems may be impossible to operate manually during outages (no human fallback for complex ML-driven decisions); and AI incidents may require regulatory notification even when operational SLOs are technically met.
For CIOs, CTOs, and CROs at APRA-regulated entities, this pattern provides a defensible compliance architecture that satisfies CPS230 §19–§46 obligations for AI systems. It is a mandatory pattern for any APRA-regulated entity operating AI systems that support critical operations, and a recommended pattern for AI systems supporting important (non-critical) operations.
2. Problem Statement
Business Problem
APRA-regulated entities are deploying AI systems that support critical operations (credit decisioning, claims processing, fraud detection, superannuation administration) without applying the same operational resilience disciplines required for traditional IT systems. CPS230 explicitly requires critical operation identification and resilience management; AI system failures are not recognised as operational disruptions in most existing BCM frameworks.
Technical Problem
Traditional BCP/DR frameworks are designed for binary system availability (up/down). AI systems can fail in ways that preserve technical availability while delivering materially degraded outputs (silent performance degradation, bias emergence, hallucination). Standard monitoring cannot detect these AI-specific failure modes. CPS230 scenario testing requirements cannot be met with standard disaster recovery exercises.
Symptoms
- AI systems supporting credit or claims decisions not identified as critical operation supporting technology
- Business Continuity Plans lack manual fallback procedures for AI-supported decisions
- Third-party AI provider (e.g., AWS Bedrock, Azure OpenAI) not assessed as a material service provider under CPS230 §28
- No tolerance for disruption defined for AI-dependent processes
- Scenario testing does not include AI system failure scenarios
Cost of Inaction
- Regulatory: APRA enforcement action under CPS230; public letter to board; direction to remediate; financial penalty in extreme cases
- Operational: AI failure cascading to critical operation disruption without pre-planned response
- Financial: Disruption to credit, claims, or fund administration affecting customer outcomes and generating FOS/AFCA complaints
3. Context
When to Apply
- Any APRA-regulated entity (ADI, general insurer, life insurer, RSE licensee) with AI systems supporting operations
- AI systems that directly or indirectly support operations designated as Critical under CPS230 §17
- AI systems forming part of material service provider relationships under CPS230 §28
- Before 1 July 2025 for entities with existing AI systems; immediately for new deployments
When NOT to Apply
- Non-APRA-regulated entities (but CPS230 alignment is good practice for any regulated financial service)
- AI systems used exclusively for internal productivity with no customer-facing impact and no critical operation dependency
Prerequisites
- Critical operation identification completed (CPS230 §17 obligation)
- AI Model Register (GOV001) operational — provides AI system inventory for critical operation mapping
- Third-party risk management framework capable of assessing AI vendors
- Business Continuity Management framework that can be extended for AI-specific scenarios
Industry Applicability
| Entity Type | Effective Date | Critical AI Use Cases | Key CPS230 Sections |
|---|---|---|---|
| ADIs (banks) | 1 Jul 2025 | Credit decisioning, fraud detection, KYC | §17, §19, §28, §43 |
| General insurers | 1 Jul 2025 | Claims assessment, pricing, fraud | §17, §19, §28, §43 |
| Life insurers | 1 Jul 2025 | Underwriting, claims, customer service AI | §17, §19, §28 |
| RSE licensees | 1 Jul 2025 | Member administration, advice, investment | §17, §19, §28 |
| APRA-regulated fintech | 1 Jul 2025 (if ADI licence) | Core banking AI, lending AI | All above |
4. Architecture Overview
The CPS230 AI Compliance architecture addresses four specific obligations in the standard: critical operation identification and mapping (§17), operational risk assessment (§19), third-party arrangement management (§28), and incident notification (§43). A fifth architectural concern—tolerance for disruption—underpins all four.
Critical Operation AI Dependency Mapping. CPS230 §17 requires APRA-regulated entities to identify their critical operations and the resources that support them. For AI systems, this requires a systematic dependency mapping: which AI models support which business processes, which business processes qualify as critical operations, and therefore which AI models are critical operation supporting systems. The AI Model Register (GOV001) provides the inventory; this pattern adds the critical operation tagging field and the dependency mapping tool.
The mapping reveals a common finding in APRA-regulated entities: more operations are AI-dependent than risk teams realise. Fraud detection AI is obviously critical; but customer service chatbots that route complaints, document classification systems that process insurance claims, and model-driven pricing systems are also part of critical operation chains.
Tolerance for Disruption (TFD) for AI Systems. CPS230 §19 requires entities to define a tolerance for disruption—the maximum period each critical operation can be disrupted before material customer or financial impact occurs. For AI-supported operations, TFD has two components: (1) traditional availability TFD (how long can the AI system be completely unavailable before critical operation is impaired?) and (2) quality TFD (how long can the AI system be producing degraded-quality outputs before critical operation is impaired?). The quality TFD is novel to AI and not in standard BCP frameworks. A credit model producing decisions with accuracy degraded to 60% may still be "available" but is causing material harm after a few thousand decisions.
Manual Fallback Architecture. CPS230 requires business continuity plans for critical operations. For AI-supported critical operations, the BCP must specify: what is the manual fallback process when the AI is unavailable? Who can execute it? What decision tools (scoring tables, decision trees, expert system) replace the AI? What volume can the manual process handle (throughput capacity)? For many AI systems, the honest answer is "we cannot process this volume manually"—which means the TFD for that operation is effectively zero for the AI component, requiring extremely high availability targets and vendor redundancy.
Third-Party AI Vendor Management (§28). Many APRA-regulated entities rely on cloud AI APIs (AWS Bedrock, Azure OpenAI, Google Vertex AI, Anthropic) as material service providers. CPS230 §28 requires: written agreement, risk assessment, exit strategy, access to performance data, right to audit, concentration risk management. For AI vendors specifically, §28 assessment requires: model version change notification requirements, output quality SLAs (not just availability SLAs), data processing agreement, geographic restrictions on training data, and the vendor's own BCP for the AI service. Concentration risk is particularly relevant: if the enterprise uses a single foundation model provider for multiple critical operations, failure of that provider creates correlated AI risk across operations.
AI Scenario Testing. CPS230 §22 requires scenario analysis for operational risks, including severe but plausible disruption scenarios. For AI systems, four scenario types must be tested: (1) AI system complete unavailability (standard DR scenario); (2) AI system silent performance degradation (no alerts, model quietly failing); (3) AI vendor outage affecting multiple AI systems simultaneously (concentration risk scenario); (4) Adversarial attack on AI system causing systematic wrong decisions. Each scenario requires a documented test plan, execution record, and findings report.
5. Architecture Diagram
6. Components
| Component | Type | Responsibility | Technology Options | Criticality |
|---|---|---|---|---|
| Critical Operation AI Dependency Mapper | Analysis Tool | Maps AI systems to business processes; identifies critical operation dependencies | Custom CMDB extension, ServiceNow, Archer | Critical |
| TFD Definition Tool | Governance Process | Structured process for defining availability and quality TFD per AI system | Workshop facilitation template + governance tool | Critical |
| AI Quality Monitor | Monitoring | Monitors AI output quality metrics against quality TFD thresholds | GOV006 bias pipeline + custom quality metrics | Critical |
| Manual Fallback Process Library | BCP Documentation | Documents manual fallback procedures for each AI-dependent critical operation | Confluence, SharePoint, BCP tool | High |
| AI Vendor Risk Assessment Template | Governance Process | AI-specific criteria for CPS230 §28 third-party assessment | Template in GRC system | Critical |
| AI Vendor Register | Data Store | Inventory of AI vendors with assessment status, contract details, concentration exposure | ServiceNow, Archer, or GOV001 extension | High |
| Scenario Test Planning Tool | Governance Process | Structures four AI scenario types; tracks test execution and findings | GRC system + test management tool | High |
| APRA Notification Workflow | Compliance Process | Manages 72-hour APRA notification window; drafts, approves, and submits notifications | ServiceNow workflow + document management | Critical |
7. Data Flow
Critical Operation AI Mapping Flow
| Step | Actor | Action | Output |
|---|---|---|---|
| 1 | AI Governance + Business | Extract AI system inventory from GOV001 | AI system list with MRID and use case |
| 2 | Business Operations | Map AI systems to business processes | AI-to-process dependency map |
| 3 | Risk + Operations | Identify business processes qualifying as critical operations (CPS230 §17 criteria) | Critical operation list |
| 4 | Risk | Identify AI systems supporting critical operations | Critical operation AI dependency map |
| 5 | Risk | Define TFD (availability + quality) for each AI-dependent critical operation | Tolerance for disruption table per operation |
| 6 | Engineering | Configure monitoring to detect TFD breaches | Monitoring thresholds aligned to TFD |
8. Security Considerations
AI Security as CPS230 Operational Risk
CPS230 §19 requires assessment of operational risks including cyber and technology risks. AI-specific security risks (adversarial attacks, model theft, prompt injection) must be included in the operational risk assessment for AI-dependent critical operations.
Third-Party AI Vendor Security
§28 requires assessment of service providers' security arrangements. AI vendor security assessment must include: data centre security certifications, penetration testing of AI APIs, model weight security (preventing extraction), and security incident notification obligations.
OWASP LLM Mapping for CPS230
| OWASP LLM Risk | CPS230 Operational Risk Category | Required Control |
|---|---|---|
| LLM03 Training Data Poisoning | Technology risk — model integrity | Vendor training data provenance assessment |
| LLM05 Supply Chain | Third-party risk | §28 vendor assessment with supply chain scope |
| LLM08 Excessive Agency | Operational risk — autonomous AI | TFD-scoped human oversight requirement |
9. Governance Considerations
Board Obligations (CPS230 §7–§9)
The Board must approve the entity's risk management strategy including operational risk. Board must receive quarterly reporting on AI system resilience status, vendor concentration risk, and TFD compliance for AI-dependent critical operations.
Governance Artefacts
| Artefact | Owner | Frequency | CPS230 Reference |
|---|---|---|---|
| Critical Operation AI Dependency Map | CRO | Annual + material change | §17, §19 |
| TFD Compliance Report | CRO | Quarterly | §19 |
| AI Vendor Risk Assessment Reports | Procurement + Risk | Annual per vendor | §28 |
| AI Scenario Test Reports | CRO | Annual | §22 |
| Material Incident Notification Log | CISO + CRO | Per event | §43 |
| Board Operational Risk Report (AI section) | CRO | Quarterly | §7 |
10. Operational Considerations
SLOs Aligned to TFD
| AI System Category | Availability TFD | Quality TFD | Availability SLO | Quality SLO |
|---|---|---|---|---|
| Critical — real-time credit | 15 minutes | 30 minutes | 99.99% | Quality monitor: daily |
| Critical — fraud detection | 5 minutes | 1 hour | 99.999% | Quality monitor: hourly |
| Important — claims processing | 4 hours | 24 hours | 99.9% | Quality monitor: daily |
| Standard — customer service AI | 24 hours | 72 hours | 99.5% | Quality monitor: weekly |
Disaster Recovery
| Scenario | RTO Target | Recovery Method |
|---|---|---|
| Primary AI endpoint failure | Per TFD above | Failover to secondary endpoint / region |
| AI vendor outage | Per TFD + manual fallback threshold | Manual fallback activation; vendor SLA claim |
| Silent quality degradation | Per quality TFD | Automatic rollback to prior model version |
11. Cost Considerations
Indicative Compliance Implementation Cost
| Activity | One-Time Cost | Ongoing Annual Cost |
|---|---|---|
| Critical operation AI dependency mapping | AUD $50,000–$100,000 | AUD $20,000 (annual review) |
| TFD definition and BCP update | AUD $80,000–$150,000 | AUD $30,000 (annual) |
| AI vendor risk assessments (per vendor) | AUD $15,000–$30,000 | AUD $10,000 (annual refresh) |
| Scenario testing programme | AUD $40,000–$80,000 | AUD $40,000 (annual) |
| APRA notification capability | AUD $20,000 | AUD $10,000 (ongoing) |
| Total | AUD $205,000–$360,000 | ~AUD $110,000/yr |
12. Trade-Off Analysis
Option Comparison
| Option | Description | Pros | Cons | Recommended For |
|---|---|---|---|---|
| A: Full CPS230 AI compliance architecture (this pattern) | Comprehensive implementation of §17, §19, §22, §28, §43 obligations | Full regulatory compliance; defensible under examination | Significant implementation cost and effort | All APRA-regulated entities |
| B: Minimum viable compliance | Critical operation mapping + §43 notification only | Lower cost; faster to implement | Residual regulatory risk for §19, §22, §28 | Temporary position during transition; not sustainable |
| C: Standard ITSM extension | Apply existing BCP and vendor management to AI with minor updates | Low incremental cost | Standard ITSM misses AI-specific failure modes; quality TFD not addressed | Not acceptable for AI supporting critical operations |
13. Failure Modes
| Failure | Likelihood | Impact | Detection | Recovery |
|---|---|---|---|---|
| AI system supporting critical operation not identified in dependency map | High (initially) | Critical — unmanaged risk | Discovery via incident or examination | Comprehensive mapping exercise; GOV001 integration |
| Quality TFD breach not detected (no quality monitoring) | Medium | High — harm accruing before detection | Quality monitor gap | Implement quality monitoring per quality TFD |
| APRA notification window missed (72h) | Low | Critical — enforcement risk | Notification workflow SLA monitor | Voluntary disclosure with explanation; legal counsel |
| AI vendor fails without exit strategy | Low | Critical — critical operation disrupted | Vendor health monitoring; contractual notification rights | Pre-positioned fallback vendor; tested exit procedure |
14. Regulatory Considerations
CPS230 Specific Obligations
| Section | Obligation | Architecture Implementation |
|---|---|---|
| §17 | Identify critical operations | Critical operation AI dependency map |
| §19 | Operational risk management | TFD definition; quality monitoring; BCP |
| §22 | Scenario analysis | Four AI scenario tests annually |
| §23 | Business continuity plan | Manual fallback procedures documented |
| §28 | Third-party arrangements | AI vendor risk assessment; written agreements |
| §43 | Material incident notification | 72-hour APRA notification workflow |
| §44 | Significant changes | AI deployments supporting critical operations assessed as significant changes |
| §46 | Post-incident review | AI incident PIR per GOV008 |
SPG 230 (Guidance)
APRA's Supervisory Practice Guide SPG 230 provides additional guidance on CPS230 implementation. Key AI implications: entities are expected to be able to demonstrate operational resilience of critical systems including AI; scenario testing should be "severe but plausible"; TFD should be set based on customer and financial impact, not technical convenience.
15. Reference Implementations
All reference implementations involve organisational and governance work more than technology. The architecture reflects standard enterprise tools configured for CPS230 AI compliance.
| Component | Technology |
|---|---|
| Critical Operation AI Mapping | ServiceNow CSDM or Archer with custom AI dependency fields |
| TFD and BCP Documentation | Fusion Framework, ServiceNow BCM, or Confluence with template |
| AI Vendor Risk Assessment | Prevalent, ProcessUnity, or ServiceNow VRM with AI-specific questionnaire |
| Scenario Testing Management | ServiceNow GRC or Archer with scenario testing module |
| APRA Notification | ServiceNow workflow + document management |
16. Related Patterns
| Pattern | Relationship | Dependency Direction |
|---|---|---|
| EAAPL-GOV001 AI Model Register | Input — AI inventory for critical operation mapping | GOV001 → CMP001 |
| EAAPL-GOV008 AI Incident Management | Implements — §43 notification via incident process | GOV008 → CMP001 |
| EAAPL-CMP002 APRA CPS234 | Sibling — companion prudential standard | CMP001 ↔ CMP002 |
| EAAPL-GOV007 AI Audit Trail | Evidence source — retention for §32 record-keeping | GOV007 → CMP001 |
17. Maturity Assessment
Overall Maturity: Mature (Level 4)
| Dimension | Score (1–5) | Evidence |
|---|---|---|
| Regulatory mapping completeness | 5 | All key CPS230 sections mapped to architecture |
| Critical operation AI mapping | 4 | Methodology defined; completeness depends on GOV001 maturity |
| Quality TFD innovation | 4 | Novel concept well-defined; limited industry reference implementations exist |
| Third-party AI vendor management | 4 | §28 requirements mapped; AI-specific assessment criteria defined |
| Scenario testing programme | 3 | Four scenario types defined; execution playbooks still developing |
18. Revision History
| Version | Date | Author | Changes |
|---|---|---|---|
| 1.0 | 2024-10-01 | EAAPL Working Group | Initial publication aligned to CPS230 exposure draft |
| 2.0 | 2025-07-01 | EAAPL Working Group | Updated to CPS230 final standard (effective 1 July 2025); quality TFD concept introduced |