EAAPLEnterprise AI Architecture Pattern Library
EAAPLLibraryAI GovernanceEAAPL-GOV002
EAAPL-GOV002Proven
⇄ Compare

AI Risk Assessment Framework

⚖️ AI GovernanceAPRA CPS230EU AI Act🏭 Field-tested in AU2 signals · Q2 2026

[EAAPL-GOV002] AI Risk Assessment Framework

Category: Governance / Risk Management Sub-category: Pre-deployment Risk Assessment Version: 2.0 Maturity: Mature Tags: risk-assessment, EU-AI-Act, NIST-AI-RMF, model-risk, fairness, safety, pre-deployment Regulatory Relevance: EU AI Act Articles 6–10, NIST AI RMF MAP Function, ISO/IEC 42001 §6.1.2, APRA CPS230 §19–§22


1. Executive Summary

The AI Risk Assessment Framework provides a systematic, repeatable methodology for evaluating AI system risk before any deployment into production environments. It addresses the fundamental challenge facing every regulated enterprise: how do you consistently determine whether an AI system is safe, fair, compliant, and fit-for-purpose—before customers and regulators are exposed to failures?

This pattern operationalises five risk dimensions—accuracy, fairness, security, compliance, and operational resilience—into a structured assessment workflow that produces a defensible, documented risk rating. The rating gates progression through the approval workflow (GOV003) and determines the intensity of post-deployment monitoring controls.

The framework directly implements the NIST AI RMF MAP function, satisfies EU AI Act Article 9 risk management system requirements for high-risk AI, and provides the risk classification evidence required by APRA CPS230. It is the second mandatory pattern in the EAAPL governance hierarchy, positioned immediately downstream of the AI Model Register (GOV001).

For the CIO/CTO audience, this pattern converts the abstract question "is our AI safe?" into a structured process with clear accountability, documented evidence, and regulatory defensibility. Organisations that implement this pattern reduce regulatory examination findings by eliminating the "we didn't formally assess risk" defence gap.


2. Problem Statement

Business Problem

AI systems are deployed based on technical performance metrics (accuracy, F1 score) without systematic assessment of business risk dimensions: fairness to customer segments, regulatory compliance, reputational exposure, and operational failure modes. When failures occur, there is no evidence that risk was considered before deployment.

Technical Problem

Risk dimensions for AI are heterogeneous and cannot be captured by a single metric. Accuracy metrics do not reveal discriminatory outcomes. Security assessments miss AI-specific threats (adversarial inputs, model inversion). Operational assessments miss AI-specific failure modes (distribution shift, hallucination). No single tool covers all dimensions; the framework must orchestrate multiple specialised assessment tools.

Symptoms

  • AI systems deployed with only technical performance sign-off (accuracy, latency)
  • Post-deployment discovery of fairness issues affecting customer segments
  • Regulatory findings citing absence of pre-deployment risk documentation
  • Security teams assessing AI systems using generic application security frameworks (missing AI-specific threat vectors)
  • Inconsistent risk ratings across business units for functionally similar AI systems
  • No traceability between deployment decision and risk evidence

Cost of Inaction

  • Regulatory: EU AI Act Article 9 non-compliance for high-risk systems: fines up to €30M or 6% global turnover
  • Legal: Discriminatory AI outcomes triggering class actions (financial services credit decisions, insurance pricing)
  • Operational: Distribution shift causing silent model degradation undetected for months
  • Reputational: Public disclosure of unfair AI outcomes; media scrutiny of automated decision-making

3. Context

When to Apply

  • Before any AI system is promoted to a production or production-equivalent environment
  • When an existing AI system undergoes material change (new training data, model architecture change, new use case, new user population)
  • When a third-party AI system is onboarded for use in enterprise processes
  • At periodic re-assessment intervals (annually for Low risk; 6-monthly for Medium; quarterly for High/Critical)
  • When an AI incident triggers re-assessment per GOV008

When NOT to Apply

  • Pure rules-based systems with no statistical learning component (standard risk assessment applies)
  • Internal-only development and test environments with no customer data and no production traffic
  • Trivial AI features with zero decision-making authority (e.g., autocomplete suggestions with human confirmation)

Prerequisites

  • Model registered in AI Model Register (GOV001) — MRID required
  • Model owner and business sponsor identified
  • Training data lineage documented
  • Initial use case description available
  • Access to model for technical testing (or vendor-provided technical documentation for third-party models)

Industry Applicability

Industry Applicability Primary Driver Specific Risk Dimensions
Banking (AU) Critical APRA CPS230, responsible lending obligations Accuracy, Fairness (credit), Compliance, Operational
Insurance (AU) Critical APRA CPS230, unfair discrimination Fairness (pricing), Compliance, Operational
Healthcare Critical TGA, clinical risk, patient safety Accuracy (clinical), Safety, Operational
Government High APS AI Ethics, administrative law Fairness, Compliance, Explainability
Retail Medium Consumer law, privacy Fairness, Privacy, Compliance
HR / Recruitment High Anti-discrimination law Fairness (demographic), Compliance

4. Architecture Overview

The AI Risk Assessment Framework is architected as an assessment pipeline—a structured sequence of risk dimension evaluations that produces a composite risk rating. The pipeline is intentionally modular: each dimension has its own assessment module with defined inputs, methods, thresholds, and outputs. This modularity enables dimension-specific tooling, parallel execution where dimensions are independent, and incremental framework enhancement without rebuilding from scratch.

Assessment Philosophy: Evidence-Based, Not Opinion-Based. Each risk dimension requires quantitative evidence—test results, metric outputs, statistical analyses—not just attestations from model owners. This is the critical difference between a governance checkbox process and a genuinely risk-reducing framework. The assessment workflow enforces evidence upload before any dimension can be marked complete.

Five-Dimension Risk Model. The framework assesses five orthogonal risk dimensions:

Accuracy Risk measures whether the model performs at the level required for its intended use case. This is not simply "what is the accuracy percentage"—it requires understanding acceptable error rates in context. A 95% accurate fraud detection model that misclassifies $50M in transactions annually has a very different risk profile than a 95% accurate movie recommendation system. Accuracy assessment includes baseline performance, performance under distribution shift, confidence calibration, and failure mode analysis.

Fairness Risk assesses whether model outputs disadvantage identifiable groups in ways that violate anti-discrimination obligations or organisational values. The framework tests for demographic parity, equalised odds, and individual fairness across protected attributes (race, gender, age, disability status). The threshold for acceptable disparity is use-case dependent: credit decisioning and insurance pricing face stricter fairness obligations than personalised recommendations. The Model Bias Detection pattern (GOV006) provides the continuous post-deployment counterpart to this pre-deployment assessment.

Security Risk evaluates AI-specific threats that standard application security assessment misses: adversarial input attacks (crafted inputs that cause misclassification), model inversion attacks (extracting training data from model outputs), membership inference attacks (determining if specific data was in the training set), prompt injection for generative AI, and supply chain risks from third-party model weights. The assessment uses both static analysis and adversarial testing.

Compliance Risk determines whether the model's operation complies with applicable law and regulation: privacy obligations (data minimisation, purpose limitation), sector-specific requirements (responsible lending, insurance pricing regulations), automated decision-making rights (Privacy Act, GDPR Article 22), and geographic restrictions. Compliance risk is the only dimension where a single finding can be an absolute blocker regardless of performance on other dimensions.

Operational Risk assesses resilience: What happens when the model fails? How quickly does performance degrade under distribution shift? What is the fallback when the model is unavailable? Does the model have appropriate monitoring coverage? Operational risk is particularly important for AI systems embedded in critical business processes where model unavailability would constitute an operational disruption.

Composite Risk Rating. The five dimension scores are aggregated using a worst-case-dominated algorithm: the composite rating equals the highest individual dimension rating. This is architecturally deliberate—a model that is perfectly accurate and fair but has Critical security risk should be rated Critical overall. Business units cannot average-down a critical dimension finding.

Use-Case Context Adjustment. Raw dimension scores are adjusted for use-case context: a credit decision AI requires tighter fairness thresholds than an internal document classifier. Context vectors (customer-facing, automated decision, financial consequence, vulnerable population involvement) amplify or reduce threshold sensitivity. This prevents the framework from applying uniform thresholds to fundamentally different risk contexts.

Mandatory vs. Conditional Assessments. Not every dimension requires the same assessment depth for every model. The framework uses a risk-screening questionnaire to determine which assessment modules apply at full depth vs. abbreviated form. A low-risk internal productivity tool may complete Accuracy and Operational assessments at abbreviated depth, while a customer-facing credit model requires full depth across all five dimensions plus external expert review.


5. Architecture Diagram

ARCHITECTURE DIAGRAM
flowchart TD subgraph Intake["Intake and Scoping"] A[Model Owner Request] B{Screening Questionnaire} end subgraph Assessment["Five-Dimension Assessment"] C[Accuracy and Fairness] D[Security and Compliance] E[Operational Resilience] end subgraph Output["Rating and Action"] F[Composite Risk Rating] G[GOV001 Update + GOV003 Trigger] H[Expert Review Required] end A --> B B -->|full assessment| C B -->|full assessment| D B -->|all tracks| E C -->|score| F D -->|score| F E -->|score| F F -->|critical or high| H F -->|any tier| G H --> G style A fill:#dbeafe,stroke:#3b82f6 style B fill:#f3e8ff,stroke:#a855f7 style C fill:#f0fdf4,stroke:#22c55e style D fill:#f0fdf4,stroke:#22c55e style E fill:#f0fdf4,stroke:#22c55e style F fill:#f3e8ff,stroke:#a855f7 style G fill:#d1fae5,stroke:#10b981 style H fill:#fee2e2,stroke:#ef4444

6. Components

Component Type Responsibility Technology Options Criticality
Screening Questionnaire Engine Assessment Service Administers 15-question scoping questionnaire; determines assessment depth per dimension Typeform, custom React form, ServiceNow survey High
Accuracy Assessment Module Assessment Engine Benchmarks model performance on holdout data; tests distribution shift robustness; validates confidence calibration MLflow, Evidently AI, custom pytest harness Critical
Fairness Assessment Module Assessment Engine Measures demographic parity, equalised odds, individual fairness across protected attributes Fairlearn, IBM AI Fairness 360, Aequitas Critical
Security Assessment Module Assessment Engine Tests adversarial robustness, prompt injection resilience, model inversion resistance Adversarial Robustness Toolbox (ART), Garak, custom red-team scripts Critical
Compliance Assessment Module Assessment Engine Maps model behaviour to regulatory requirements; checks data lineage for privacy compliance; validates geographic restrictions Custom rules engine + legal checklist, OneTrust AI module Critical
Operational Resilience Module Assessment Engine Assesses failure modes, fallback mechanisms, monitoring coverage, DR capability Custom checklist + integration with infrastructure tooling High
Risk Aggregation Engine Business Logic Applies worst-case-dominated composite scoring; performs context adjustment Custom Python scoring engine Critical
Assessment Portal User Interface Guided workflow for assessors; evidence upload; dimension-by-dimension progress tracking React SPA with file upload, ServiceNow custom app High
Evidence Repository Data Storage Stores all assessment evidence artefacts with immutable references S3 + DynamoDB index, Azure Blob + Cosmos DB Critical
Expert Review Workflow Process Integration Routes high/critical assessments to external expert review queue ServiceNow workflow, Jira advanced workflow High
Report Generator Document Service Produces standardised risk assessment report for GOV003 consumption Pandoc + LaTeX, custom PDF generator High

7. Data Flow

Primary Assessment Flow

Step Actor Action Output
1 Model Owner Submits assessment request with MRID and use-case context Assessment request ticket, unique assessment ID
2 Screening Engine Presents 15-question scoping questionnaire; determines applicable modules Assessment scope document; module activation list
3 Assessment Platform Provisions assessment workspace; notifies assigned assessors Workspace URL; assessor notification
4 Accuracy Assessor Uploads holdout test set; runs accuracy, calibration, and drift tests Accuracy dimension score (1–5) + evidence bundle
5 Fairness Assessor Runs fairness metrics against protected attributes; documents thresholds Fairness dimension score + disparity measurements
6 Security Assessor Executes adversarial test suite; documents findings and severity Security dimension score + vulnerability findings
7 Compliance Assessor Reviews regulatory mapping; documents compliance gaps Compliance dimension score + gap register
8 Operations Assessor Completes operational resilience checklist; reviews monitoring coverage Operational dimension score + monitoring recommendations
9 Risk Aggregation Engine Computes composite score; applies context adjustment; determines final rating Final risk rating + rating rationale document
10 Expert Review (if required) External expert reviews high/critical findings; may modify ratings Expert sign-off or escalation
11 Report Generator Produces standardised assessment report PDF assessment report with evidence links
12 Framework Updates model register with risk tier; triggers GOV003 approval workflow MRID updated; approval workflow initiated

Error Flow

Condition Detection Response Recovery
Holdout test data unavailable Accuracy module Assessment blocked; model owner notified Model owner must supply test data before assessment proceeds
Fairness assessment finds critical disparity Fairness module Assessment continues but dimension scored Critical; escalation alert sent Model owner must remediate or accept Critical rating
Assessment exceeds SLA (10 business days) SLA monitor Escalation to AI Governance; assessment status reported to risk committee Priority queue assignment; additional assessor resource
Evidence upload fails integrity check Evidence repository Evidence rejected; assessor notified Re-upload with corrected file; integrity check logged

8. Security Considerations

Authentication & Authorisation

  • Assessors must have Assessment Contributor role (granted per-assessment, time-limited)
  • Evidence bundles accessible only to assessors on active engagement + Compliance Reader role
  • External experts access via time-limited, scoped portal credentials; no enterprise SSO

Secrets Management

  • Model API keys used for adversarial testing stored in Vault with per-assessment lease
  • Test dataset S3 paths generated as pre-signed URLs with 7-day expiry

Data Classification

  • Test datasets carrying personal information: CONFIDENTIAL; processed in isolated assessment environment
  • Assessment reports: RESTRICTED until model approved/rejected; then CONFIDENTIAL
  • Fairness test results containing demographic analysis: RESTRICTED (demographic inference risk)

Encryption

  • All evidence at rest: AES-256
  • Evidence in transit: TLS 1.3; assessor portal enforces HTTPS only
  • Sensitive training data used for assessment: encrypted assessment sandbox with no internet egress

OWASP LLM Top 10 Mapping

OWASP LLM Risk Assessment Coverage Test Method
LLM01 Prompt Injection Security module mandatory test Automated Garak probe suite + manual red team
LLM03 Training Data Poisoning Accuracy module drift tests; Security module supply chain review Data provenance audit + anomaly detection on training set
LLM05 Supply Chain Vulnerabilities Security module: model weight provenance check Model card review; reproducibility verification; dependency scan
LLM06 Sensitive Information Disclosure Security module: model inversion and membership inference tests ART membership inference attack; output monitoring for PII patterns
LLM07 Insecure Plugin Design Security module: integration point review Integration threat model; API security scan

9. Governance Considerations

Responsible AI Alignment

The framework operationalises all five responsible AI principles: accuracy (performance), fairness (bias testing), security (adversarial), compliance (regulatory), and operational (accountability + human oversight). No principle is assessed by attestation alone.

Model Risk Alignment

The framework produces the primary input to model risk classification. Risk rating feeds directly into:

  • Model validation requirements (independent validation required for High/Critical)
  • Human oversight intensity (automated decisions require human review for Critical-rated models)
  • Monitoring frequency and alert thresholds

Human Approval Workflow Integration

Assessment output directly triggers GOV003 approval workflow. Risk rating determines approval authority level:

  • Low: AI Governance lead sign-off
  • Medium: AI Governance + Risk Officer
  • High: AI Governance + Risk Officer + CISO
  • Critical: Full approval board including CRO and General Counsel

Policy Integration

Compliance module references current versions of: AI Use Policy, Data Classification Policy, Geographic Restriction Register, Prohibited Use Case Register. Policy updates trigger re-assessment queue for affected deployed models.

Governance Artefacts

Artefact Owner Frequency Regulatory Linkage
Assessment Report (per model) AI Governance Per assessment event EU AI Act Article 9; NIST MAP function
Fairness Disparity Registry AI Governance Quarterly rollup Anti-discrimination legislation
Assessment SLA Report AI Governance Monthly APRA CPS230 §22 governance reporting
Critical/High Finding Register Risk Committee Monthly ISO 42001 §6.1.2

10. Operational Considerations

Monitoring

Metric Target Alert Threshold Owner
Assessment SLA compliance >90% completed within 10 business days <80% in any month AI Governance
Assessments with missing evidence 0 Any evidence gap at submission Quality Manager
Critical findings not remediated within SLA 0 Any Critical finding > 5 business days open Risk Officer
Re-assessment overdue rate 0% for High/Critical models Any overdue at interval AI Governance

SLOs

SLO Target Measurement Window
Assessment portal availability 99.5% 30-day rolling
Evidence upload success rate >99% 7-day rolling
Assessment completion within SLA >90% Monthly

Disaster Recovery

Scenario RTO RPO Recovery Method
Assessment portal unavailable 4 hours 30 minutes Standby deployment; in-progress assessments resume from last checkpoint
Evidence repository unavailable 1 hour 5 minutes Multi-region replication; read from replica

11. Cost Considerations

Cost Drivers

Driver Cost Type Notes
Assessor time (internal FTE) Dominant — variable 2–5 days per full assessment; 0.5 days abbreviated
Fairness/bias tooling licences Fixed SaaS or open-source IBM AI Fairness 360 (open source); commercial options $20K–$80K/yr
Adversarial testing infrastructure Variable compute GPU-required for LLM red-teaming; $50–$500 per assessment
External expert review Variable professional fees AUD $5,000–$25,000 per Critical-tier review
Assessment portal infrastructure Fixed Low; lightweight web app + storage

Indicative Cost Range

Assessment Tier Internal Cost (FTE days) External Cost Total per Assessment
Abbreviated (Low risk) 0.5 days × AUD $1,500 = $750 Nil ~AUD $750
Full — Medium 3 days × AUD $1,500 = $4,500 Nil ~AUD $4,500
Full — High 5 days × AUD $1,500 = $7,500 Expert: AUD $10K ~AUD $17,500
Full — Critical 5 days × AUD $1,500 = $7,500 Expert: AUD $25K ~AUD $32,500

12. Trade-Off Analysis

Option Comparison

Option Description Pros Cons Recommended For
A: Structured pipeline (this pattern) Five-dimension modular assessment with evidence gates Comprehensive; defensible; regulatory-grade documentation High effort per assessment; requires trained assessors Regulated industries; customer-facing AI
B: Lightweight checklist Single questionnaire producing binary pass/fail Fast; low cost; low effort Superficial; regulatory examiners will challenge depth Internal tools; non-consequential AI
C: Third-party AI audit firm Outsource all assessment to specialist firm Deep expertise; independence Expensive ($50K–$200K per assessment); slow turnaround; knowledge transfer lost IPO/M&A due diligence; first-ever enterprise assessment
D: Automated assessment only Run automated tools; no human review Scalable; fast Misses contextual risk; no legal/compliance judgement; automation cannot assess some dimensions High-volume, low-risk model refresh cycles

Architectural Tensions

Tension This Pattern's Stance Mitigation
Thoroughness vs. Speed Full assessment required for High/Critical Abbreviated track for Low/Medium; parallel module execution
Objectivity vs. Practicality Quantitative evidence required Accepted proxy metrics for dimensions where ground truth unavailable
Internal vs. External Assessment Internal by default; external for Critical Expert review escalation path built into workflow
Point-in-time vs. Continuous Pre-deployment assessment GOV006 (bias detection) and GOV007 (audit trail) provide continuous counterpart

13. Failure Modes

Failure Likelihood Impact Detection Recovery
Assessor lack of expertise for AI-specific tests Medium High — dimension under-scored Quality review of assessment reports; spot-check re-testing Mandatory assessor certification; second-opinion process for High/Critical
Benchmark dataset not representative of production distribution Medium High — accuracy risk under-estimated Post-deployment monitoring vs. assessment predictions Production shadow mode testing before full deployment
Fairness threshold set too loosely Medium Critical — discriminatory model deployed Post-deployment demographic impact monitoring; regulatory audit Retrospective re-assessment; model rollback; regulatory notification
Critical security finding ignored under business pressure Low Critical — exploitable model in production Independent security review; approval gate enforcement Mandatory CISO escalation; deployment blocked pending remediation
Re-assessment interval missed High Medium — stale risk rating Automated expiry alerts from GOV001 Calendar-driven re-assessment queue; ownership accountability

14. Regulatory Considerations

APRA CPS230

  • §19: Risk management requires identification, assessment, and mitigation of operational risks. AI systems must be assessed before deployment as part of the operational risk framework.
  • §22: Scenario analysis requirements: accuracy failure, fairness failure, and security compromise scenarios must be documented and tested.

EU AI Act

  • Article 6 & Annex III: High-risk AI classification determines mandatory conformity assessment. Framework outputs the classification evidence.
  • Article 9: High-risk AI systems must have a risk management system with continuous iterative process for identification, analysis, and evaluation of risks. This framework implements Article 9 requirements.
  • Article 10: Training, validation, and testing data requirements. Accuracy module includes data quality assessment satisfying Article 10(2).

NIST AI RMF

  • MAP 1.1–1.6: Context establishment and risk identification. Framework screening questionnaire and use-case context adjustment implement MAP function.
  • MAP 2.1–2.3: Risk categorisation. Five-dimension scoring produces NIST-aligned risk categorisation.
  • MEASURE 1.1–2.6: Metrics and measurement approaches for all five dimensions are defined and implemented.

ISO/IEC 42001:2023

  • §6.1.2: AI risk identification and assessment. Framework is the primary implementation of this clause.
  • §8.4: Operational planning and control for AI risk assessment processes.

Privacy Act 1988 / APPs

  • APP 3: Collection of personal information. Compliance module checks that model input data collection meets purpose limitation and minimisation requirements.
  • APP 11: Security. Security module assessment satisfies APP 11 technical security obligations for AI systems processing personal information.

15. Reference Implementations

AWS

Component AWS Service
Assessment Portal Amplify (React) + API Gateway
Assessment Orchestration Step Functions
Fairness Testing SageMaker Clarify
Accuracy Testing SageMaker Model Monitor
Evidence Storage S3 + Macie (PII scanning)
Adversarial Testing Custom ECS Fargate job (ART library)

Azure

Component Azure Service
Fairness Testing Azure Responsible AI Dashboard
Accuracy Testing Azure ML Model Monitor
Evidence Storage Azure Blob Storage
Assessment Workflow Azure Logic Apps + Power Automate

GCP

Component GCP Service
Fairness Testing Vertex AI Model Evaluation + What-If Tool
Accuracy Testing Vertex AI Model Monitoring
Assessment Workflow Cloud Workflows

Open Source Stack (On-Premises)

Component Technology
Fairness Testing IBM AI Fairness 360, Fairlearn
Adversarial Testing Adversarial Robustness Toolbox (ART), Garak
Accuracy + Drift Evidently AI, NannyML
Assessment Workflow Apache Airflow or Prefect
Evidence Storage MinIO (S3-compatible)

Pattern Relationship Dependency Direction
EAAPL-GOV001 AI Model Register Upstream trigger — MRID required; outputs update register risk tier GOV001 → GOV002 → GOV001
EAAPL-GOV003 AI Approval Workflow Downstream consumer — assessment report gates approval GOV002 → GOV003
EAAPL-GOV005 Responsible AI Framework Alignment — framework dimensions implement responsible AI principles Bidirectional
EAAPL-GOV006 Model Bias Detection Continuous counterpart — post-deployment fairness monitoring GOV002 establishes fairness baselines for GOV006
EAAPL-GOV007 AI Audit Trail Complementary — assessment evidence stored in audit trail GOV002 → GOV007
EAAPL-CMP003 EU AI Act Compliance Satisfies — Article 9 risk management system GOV002 → CMP003

17. Maturity Assessment

Overall Maturity: Mature (Level 4)

Dimension Score (1–5) Evidence
Dimension coverage 5 All five risk dimensions fully specified with quantitative methods
Regulatory mapping 5 Explicit clause-level mapping to EU AI Act Art 9, NIST AI RMF MAP, ISO 42001 §6.1.2
Tooling integration 4 Reference implementations for AWS/Azure/GCP/OSS; gap is unified tooling for all five dimensions
Assessor capability framework 3 Assessment methods defined; formal assessor training programme not yet specified
Continuous assessment 3 Periodic re-assessment defined; gap is trigger-based automated re-assessment from production signals

18. Revision History

Version Date Author Changes
1.0 2024-02-01 EAAPL Working Group Initial publication
1.1 2024-08-15 EAAPL Working Group Added EU AI Act Article 9 mapping; adversarial testing module
2.0 2025-06-01 EAAPL Working Group Full rewrite: five-dimension model; context adjustment; NIST AI RMF MAP alignment
← Back to LibraryMore AI Governance