EAAPLEnterprise AI Architecture Pattern Library
EAAPLLibraryAI GovernanceEAAPL-GOV007
EAAPL-GOV007Proven
⇄ Compare

AI Audit Trail

⚖️ AI GovernanceAPRA CPS230APRA CPS234🏭 Field-tested in AU

[EAAPL-GOV007] AI Audit Trail

Category: Governance / Compliance & Traceability Sub-category: Immutable Decision Logging Version: 2.0 Maturity: Mature Tags: audit-trail, immutable-log, decision-traceability, WORM, tamper-evident, regulatory-retention Regulatory Relevance: APRA CPS230 §32, APRA CPS234 §19, EU AI Act Article 12, Privacy Act APP 11, ISO/IEC 42001 §9.1, NIST AI RMF MANAGE 4.1


1. Executive Summary

The AI Audit Trail pattern establishes an immutable, tamper-evident log of every AI decision made by enterprise AI systems. It captures the full decision context: sanitised input, model version, prompt version (for generative AI), output, confidence score, policy decisions applied, human overrides, and data sources consumed. Retention is aligned to regulatory requirements—7 years for APRA-regulated entities, which exceeds standard application logging.

This pattern is foundational to enterprise AI governance. Without an audit trail, every governance claim is unverifiable: you cannot prove a policy was applied, you cannot reconstruct what a model said to a customer, you cannot demonstrate human oversight occurred, and you cannot respond to regulatory requests for decision evidence. The EU AI Act Article 12 makes logging mandatory for high-risk AI systems; APRA CPS234 requires evidence of information security controls.

Beyond compliance, the audit trail enables capability that pure governance cannot: post-incident forensics to understand exactly what an AI system did; statistical analysis of decision patterns to detect emerging bias; accountability assignment when AI decisions are disputed; and model performance benchmarking against actual outcomes.

The pattern's defining architectural commitment is immutability. Write-once, read-many (WORM) storage with cryptographic integrity verification ensures that audit records cannot be modified, deleted, or backdated. This is not a logging best practice—it is a regulatory requirement for any organisation claiming its AI audit trail as compliance evidence.


2. Problem Statement

Business Problem

When AI decisions are disputed—by customers, regulators, or in legal proceedings—organisations cannot reconstruct what the AI system actually said, what data it used, and what policies governed it. Standard application logs are insufficient: they are not retained long enough, they are not structured for decision reconstruction, and they are not tamper-proof.

Technical Problem

AI decisions have richer context than standard API calls: model version, prompt version, retrieved context (for RAG systems), token-level confidence, policy decisions from GOV004, human override events from GOV005, and ground-truth outcome feedback. Standard logging infrastructure cannot capture this structured, multi-source context and retain it in WORM format for 7 years.

Symptoms

  • Inability to reconstruct what an AI system told a specific customer on a specific date
  • No evidence that policy guardrails were applied to a flagged AI decision
  • Human override events not captured, preventing audit of human oversight effectiveness
  • AI decision logs retained for 90 days (standard app log retention) vs. 7-year regulatory requirement
  • Logs mutable by administrators, failing tamper-evidence requirements
  • No correlation between AI decision and downstream outcome (outcome feedback not captured)

Cost of Inaction

  • Regulatory: APRA CPS234 §19 non-compliance (information security evidence); EU AI Act Article 12 (logging obligation)
  • Legal: Inability to defend AI decisions in legal proceedings due to absence of evidence
  • Operational: Post-incident forensics impossible; root cause analysis speculative
  • Governance: Human oversight claimed but not evidenced; responsible AI controls unverifiable

3. Context

When to Apply

  • All AI systems making decisions affecting individuals in regulated industries
  • Any AI system subject to EU AI Act high-risk classification (Annex III)
  • AI systems in APRA-regulated entities (all, per CPS234 information security obligations)
  • Any AI system where post-incident forensics capability is required
  • Generative AI systems producing customer-facing outputs (financial advice, healthcare recommendations)

When NOT to Apply

  • Internal AI systems with no customer or regulatory exposure and sub-30-day retention acceptable
  • Ultra-high-volume, low-consequence AI (e.g., real-time recommendation clickthrough) where full decision logging is cost-prohibitive — use sampled logging with defined sampling strategy

Prerequisites

  • Information classification scheme for AI inputs and outputs
  • WORM-capable storage infrastructure
  • Identity context available at inference time (user ID, session ID)
  • Model register (GOV001) operational — MRID required in each log entry

Industry Applicability

Industry Retention Requirement Key Log Fields Primary Driver
Banking (AU) 7 years Decision, confidence, policy applied, human override APRA CPS230 §32
Insurance (AU) 7 years Underwriting decision, rating factors, model version APRA record-keeping
Healthcare 7 years (clinical) Clinical recommendation, clinical model version, clinician override Health Records Act
Financial Services (EU) 10 years (MiFID II) Investment recommendation, AI model version, disclaimer applied MiFID II Article 25; EU AI Act Article 12
Government 7–30 years (varies) Administrative decision, AI contribution, human decision Archives Act; administrative law

4. Architecture Overview

The AI Audit Trail is architected around two non-negotiable properties: completeness and immutability. Completeness means every decision is captured with sufficient context to reconstruct the decision scenario. Immutability means no record can be modified or deleted once written, including by system administrators.

Decision Record Schema. The audit record schema is the most critical design decision in this pattern. It must be rich enough to satisfy regulatory reconstruction requirements while not so voluminous that storage costs become prohibitive at scale. The schema stratifies into four payload tiers:

Mandatory Tier (always captured): Decision ID (UUID), MRID (model version, architecture hash), timestamp (UTC, nanosecond precision), actor identity (user ID, session ID, application context), input fingerprint (cryptographic hash of sanitised input — not the raw input, which may contain PII), output summary (sanitised, PII-free summary of the model output), decision type (classification/recommendation/generation), confidence score, latency, policy decision reference (GOV004), and regulatory context flags.

Decision Tier (for consequential decisions): Full decision rationale (for explainable models), counterfactual summary, data sources referenced (document IDs for RAG systems, feature values for tabular models), fairness context (demographic group if known), and human oversight indicator.

Override Tier (conditional — only when human override occurs): Override actor identity, override timestamp, override rationale, original AI decision preserved, override decision, and escalation reference.

Outcome Tier (populated retrospectively): Ground truth outcome, outcome timestamp, outcome source, accuracy flag. This tier enables model performance analysis against real-world outcomes and is critical for GOV006 bias detection (equalised odds requires ground truth).

Immutability Architecture. The immutability guarantee is implemented at multiple layers to prevent single points of bypass. First, the audit log writer is the only component with write access to the log store—application code cannot write directly. Second, the log store is configured as WORM (Write Once Read Many) with Compliance mode (not Governance mode), meaning even the storage administrator cannot delete or modify records during the retention period. Third, each record is written with a cryptographic hash of its content, enabling integrity verification without trusting the storage layer alone. Fourth, a Merkle-tree-based tamper-evidence chain links records so that any retrospective modification is detectable from the chain.

PII Sanitisation at Write Time. Raw AI inputs often contain personal information. The audit trail must preserve decision context without creating a 7-year PII retention risk in violation of Privacy Act obligations. The sanitisation pipeline, executed before write, applies: named entity recognition to identify PII fields, substitution of PII values with entity type tokens (e.g., [PERSON], [EMAIL], [ACCOUNT_NUMBER]), preservation of non-PII decision-relevant features, and a cryptographic binding between the sanitised record and the original request (so the original can be retrieved under legal process if required, from the primary application system which has appropriate retention).

Retention Tiering for Cost Management. Seven-year retention at full fidelity is cost-prohibitive for high-volume systems. The pattern implements tiered retention: hot tier (0–90 days, full queryable index, high-speed retrieval), warm tier (90 days–2 years, reduced query access, compressed storage), cold tier (2–7 years, compliance archive, retrieval SLA 24 hours, minimum cost storage). The WORM guarantee applies across all tiers.

Query Architecture. The audit trail serves two query patterns with different characteristics: operational queries (find all decisions for customer X in the past 30 days — requires fast index on user ID and date) and forensic queries (reconstruct decision state for model version Y on date Z — requires full scan with filter, less latency-sensitive). The pattern implements a search index over the hot and warm tiers for operational queries, with direct S3/blob scanning available for forensic queries.


5. Architecture Diagram

ARCHITECTURE DIAGRAM
flowchart TD subgraph Sources["Event Sources"] A[AI Model + Policy Engine] B[Human Override Events] end subgraph Pipeline["Write Pipeline"] C[Audit Log Writer] D[PII Sanitise + Hash] end subgraph Storage["Tiered WORM Storage"] E[(Hot Tier 0-90 days)] F[(WORM Archive 2-7 years)] G[Integrity Verifier] end subgraph Query["Query and Reporting"] H[RBAC Access Gate] I[Regulatory Reports] J[Tamper Alert] end A -->|decision event| C B -->|override event| C C --> D D --> E D --> F E -->|operational query| H F --> G G -->|tamper detected| J H --> I style A fill:#dbeafe,stroke:#3b82f6 style B fill:#dbeafe,stroke:#3b82f6 style C fill:#f0fdf4,stroke:#22c55e style D fill:#f0fdf4,stroke:#22c55e style E fill:#fef9c3,stroke:#eab308 style F fill:#fef9c3,stroke:#eab308 style G fill:#f0fdf4,stroke:#22c55e style H fill:#f3e8ff,stroke:#a855f7 style I fill:#d1fae5,stroke:#10b981 style J fill:#fee2e2,stroke:#ef4444

6. Components

Component Type Responsibility Technology Options Criticality
Audit Log Writer Application Service Single write pathway; accepts events from all sources; enforces schema FastAPI service, gRPC service Critical
PII Sanitisation Pipeline Data Processing Named entity recognition; PII token substitution before write Microsoft Presidio, custom spaCy pipeline Critical
Record Hash & Merkle Chain Security Control SHA-256 hash per record; Merkle chain for tamper-evidence Custom implementation (well-specified) Critical
WORM Blob Store Compliance Storage Primary immutable long-term storage; 7-year WORM AWS S3 Object Lock (Compliance), Azure Immutable Blob, Worm-compliant NAS Critical
Hot Tier Store Operational Storage Fast queryable store for recent decisions PostgreSQL (append-only trigger), OpenSearch High
Search Index Query Acceleration Full-text + faceted search for operational queries OpenSearch, Elasticsearch, Azure Cognitive Search High
Integrity Verifier Scheduled Job Daily Merkle chain verification; detects tampering Custom Python service + AWS Lambda Critical
RBAC Access Gate Security Control Enforces role-based access to audit records API Gateway + OAuth RBAC; OPA policy Critical
Lifecycle Manager Operations Manages hot→warm→cold transitions per retention schedule AWS S3 Lifecycle, Azure Blob lifecycle Medium
Regulatory Export Service Compliance Generates structured evidence packages for regulatory submissions Custom export API with APRA/EU formats High

7. Data Flow

Primary Audit Record Write Flow

Step Actor Action Output
1 AI Engine / PEP / Override Gateway Emits structured decision event to Audit Log Writer Event payload
2 Audit Log Writer Authenticates event source; validates event type Source-authenticated event
3 PII Sanitisation Pipeline Scans input and output fields; replaces PII with entity tokens; preserves cryptographic binding Sanitised event
4 Schema Validator Validates mandatory fields; enforces taxonomy values Validated event with schema version
5 Record Hash Computes SHA-256 of record content; links to previous Merkle chain root Record hash + chain link
6 WORM Write Writes record to hot tier (PostgreSQL/OpenSearch) AND WORM blob simultaneously Record written with sequence ID
7 Write Confirmation Returns success ACK to event source Durability confirmed

Regulatory Query Flow

Step Actor Action Output
1 Compliance Officer Submits regulatory evidence request (customer ID, date range, model) Query ticket
2 RBAC Gate Validates requester has Compliance role; authorises query Authorised query
3 Search Index Executes faceted query over hot + warm tiers Matching record set
4 Regulatory Export Service Formats records per submission format; includes integrity evidence Evidence package (PDF + JSON + Merkle proof)

8. Security Considerations

Immutability Enforcement Layers

  1. Application layer: Audit Log Writer is only write path; no other service has write credentials
  2. Database layer: PostgreSQL append-only enforced via trigger (no UPDATE/DELETE permitted)
  3. Storage layer: S3 Object Lock Compliance mode — storage administrators cannot override retention
  4. Verification layer: Daily Merkle chain verification detects any tampering at any layer

Access Control

  • Read access requires Audit Reader role (minimum); Compliance role for full record access; Legal role for original pre-sanitised data (requires court order workflow)
  • All reads logged (who read which records, when) — audit of the audit
  • No bulk export without specific Compliance Director approval

OWASP LLM Top 10 Mapping

OWASP LLM Risk Audit Trail Coverage Log Field
LLM01 Prompt Injection Log policy enforcement decision for injections policy_decision.injection_detected
LLM02 Insecure Output Handling Log output validator result output_validation.result
LLM06 Sensitive Information Disclosure Log PII sanitisation applied pii_sanitisation.entities_redacted
LLM08 Excessive Agency Log action scope vs approved scope policy_decision.action_scope_check
LLM09 Overreliance Log human override rate override.occurred, override.actor

9. Governance Considerations

Retention Policy Governance

Retention periods are set by Legal + Compliance, not by technology teams. Different model use cases may have different retention requirements. The retention policy table is version-controlled and reviewed annually.

Governance Artefacts

Artefact Owner Frequency Regulatory Linkage
Audit Trail Integrity Report CISO Monthly APRA CPS234 §19
Regulatory Evidence Package Compliance Per request APRA examinations; court orders
Retention Policy Compliance Report Legal Annually Privacy Act APP 11; Archives Act
Override Activity Report RAI Officer Quarterly EU AI Act Article 14
Decision Volume Report AI Governance Monthly ISO 42001 §9.1

10. Operational Considerations

SLOs

SLO Target Measurement
Write latency p99 <50ms Per write event
Write availability 99.99% 30-day rolling
Operational query latency p95 <5 seconds Per query
Forensic query completion <24 hours Per forensic request
Integrity verification Daily completion Per daily run

Disaster Recovery

Scenario RTO RPO Recovery
Hot tier database failure 15 minutes 0 (WORM blob is parallel primary) Rebuild hot tier index from WORM blob
Write path unavailable Circuit breaker: event queue buffers for 15 minutes 0 Writes resume from queue on recovery
WORM blob region failure 24 hours (cold restoration) 0 (replicated) Cross-region replication pre-configured

11. Cost Considerations

Cost Drivers

Driver Cost Type 7-Year Cost Estimate
WORM blob storage Variable — per GB At 1TB/year growth: 28TB × $23/TB/mo = AUD $7,700/yr at year 7
Hot tier database Fixed compute AUD $5,000–$20,000/yr
Search index Fixed compute AUD $8,000–$25,000/yr
Integrity verifier Minimal compute AUD $500/yr
PII sanitisation Compute per event $0.001–$0.01 per 1,000 events depending on complexity

Indicative Total Annual Cost

Scale Events/Day Annual Infrastructure 7-Year Total
Small (100K/day) 100,000 AUD $15,000 AUD $105,000
Medium (1M/day) 1,000,000 AUD $45,000 AUD $315,000
Large (10M/day) 10,000,000 AUD $120,000 AUD $840,000

12. Trade-Off Analysis

Option Comparison

Option Description Pros Cons Recommended For
A: WORM audit trail (this pattern) Immutable, tiered, cryptographically verified Regulatory-grade; tamper-evident; 7-year retention Cost; complexity All regulated entities
B: Standard application logging (ELK) Mutable logs in Elasticsearch Simple; developers familiar Mutable; insufficient retention; not WORM Development environments only
C: Blockchain/DLT audit trail Decentralised immutable ledger Strong tamper-evidence Very high cost; complexity; slow writes; overkill Niche use cases requiring external verifiability
D: SaaS audit trail (Sysdig, Datadog) Cloud SIEM with long retention Managed; easy setup Vendor lock-in; may not meet WORM requirements; data residency concerns Non-regulated organisations

Architectural Tensions

Tension Stance Mitigation
PII retention vs. Audit completeness PII sanitised at write; cryptographic binding to original Legal process recovery pathway defined
Cost vs. Completeness Tiered retention; sampled logging for ultra-high volume non-consequential AI Sampling strategy must be documented and approved
Query performance vs. Immutability Separate queryable hot tier; WORM as primary Hot tier rebuilt from WORM on failure

13. Failure Modes

Failure Likelihood Impact Detection Recovery
Write path failure causing missed records Low Critical — regulatory compliance gap Write queue depth monitoring; ACK timeouts Event queue with guaranteed delivery; replay from queue
PII sanitisation false negative (PII written to audit log) Medium High — privacy breach in audit log Periodic audit of sanitised records; PII scanner on log samples Re-sanitisation of affected records; Privacy Officer notification
Merkle chain gap (tamper indicator) Very Low Critical — evidence integrity challenged Daily integrity verifier Invoke incident response; preserve evidence; notify CISO
Retention policy misconfiguration (early deletion) Low Critical — regulatory evidence destroyed Lifecycle policy monitoring; deletion alerts Restore from replica; legal hold override for affected records

14. Regulatory Considerations

APRA CPS230

  • §32: Record-keeping obligations for APRA-regulated entities require retention of records related to material operations for 7 years. AI decision records for credit, insurance, superannuation decisions are material operation records.

APRA CPS234

  • §19: APRA-regulated entities must retain information security-relevant logs. AI decision logs containing policy enforcement decisions satisfy this obligation.

EU AI Act

  • Article 12: Logging capabilities for high-risk AI systems. Providers must ensure high-risk AI systems have automatic logging of events throughout lifetime. This pattern implements Article 12(1) and 12(2) requirements.
  • Article 12(4): For AI systems in Annex III categories related to critical infrastructure, public authorities, migration — logs must be kept for period specific to use case. Pattern implements configurable retention per use case.

Privacy Act 1988 / APPs

  • APP 11: Reasonable steps to protect personal information. PII sanitisation before writing to long-term audit log is the key control.
  • APP 12: Access to personal information. Audit records about an individual are accessible to them on request; search index supports this.

ISO/IEC 42001

  • §9.1: Monitoring and measurement of AI management system effectiveness. Audit trail provides the evidence base for effectiveness assessment.

15. Reference Implementations

AWS

Component Service
WORM Storage S3 Object Lock (Compliance mode) + Glacier for cold tier
Hot Tier DynamoDB (append-only via condition expressions)
Search Index OpenSearch Service
PII Sanitisation Comprehend (PII detection) + Lambda
Integrity Verification Lambda (scheduled)

Azure

Component Service
WORM Storage Azure Blob Storage (Immutable Blob, compliance lock)
Hot Tier Cosmos DB (append-only via stored procedure)
Search Index Azure Cognitive Search
PII Sanitisation Azure AI Language (PII extraction)

On-Premises

Component Technology
WORM Storage NetApp SnapLock Compliance / EMC DataDomain Retention Lock
Hot Tier PostgreSQL with append-only enforced via trigger
Search Index Elasticsearch
PII Sanitisation Microsoft Presidio (self-hosted)

Pattern Relationship Dependency Direction
EAAPL-GOV001 AI Model Register Input — MRID in every audit record GOV001 → GOV007
EAAPL-GOV004 AI Policy Enforcement Input — policy decisions logged GOV004 → GOV007
EAAPL-GOV005 Responsible AI Framework Consumer — accountability chain stored here GOV005 → GOV007
EAAPL-GOV006 Model Bias Detection Consumer — fairness events stored here GOV006 → GOV007
EAAPL-GOV008 AI Incident Management Consumer — forensic queries during incidents GOV008 → GOV007
EAAPL-CMP001 APRA CPS230 Satisfies — §32 record-keeping GOV007 → CMP001
EAAPL-CMP003 EU AI Act Satisfies — Article 12 logging GOV007 → CMP003

17. Maturity Assessment

Overall Maturity: Mature (Level 4)

Dimension Score (1–5) Evidence
Immutability architecture 5 WORM + Merkle chain + daily verification
Schema completeness 5 Four-tier schema covering all regulatory requirements
PII sanitisation 4 NER-based; gap is high-precision sanitisation for novel entity types
Retention tiering 4 Three tiers defined; gap is automated legal hold override process
Query capability 4 Operational + forensic query patterns; gap is AI-assisted forensic analysis

18. Revision History

Version Date Author Changes
1.0 2024-01-01 EAAPL Working Group Initial publication
1.1 2024-06-01 EAAPL Working Group Added Merkle chain tamper-evidence
1.2 2024-12-01 EAAPL Working Group EU AI Act Article 12 mapping; retention tiering
2.0 2025-08-01 EAAPL Working Group Full rewrite: four-tier schema; PII sanitisation architecture; APRA CPS230 §32 alignment
← Back to LibraryMore AI Governance