EAAPLEnterprise AI Architecture Pattern Library
EAAPLLibraryModel Management
Proven
⇄ Compare

EAAPL-MDL001 — Model Versioning and Lineage Tracking

EAAPL-MDL001 — Model Versioning and Lineage Tracking

Attribute Value
Pattern ID EAAPL-MDL001
Name Model Versioning and Lineage Tracking
Maturity Mature
Complexity Low
Tags model-register traceability accountability low-complexity
Last Reviewed 2026-06-12
Owner Enterprise AI Architecture Practice

1. Executive Summary

AI models are long-lived, continuously evolving artefacts. Without systematic versioning, organisations cannot answer the two questions regulators and boards ask first: which model made that decision? and who approved it? EAAPL-MDL001 establishes a disciplined versioning schema adapted from semantic versioning principles, a co-located artefact bundle (weights, configuration, tokeniser, preprocessing code, evaluation results), and an approval workflow that gates every version before it reaches production. By registering every version in the enterprise Model Register (EAAPL-GOV001), the organisation achieves full lineage — from training data provenance through evaluation evidence to production retirement. The pattern is deliberately low-complexity: it imposes process rigour, not infrastructure cost. CIOs gain a defensible audit trail for internal audit and external regulatory inquiries. CTOs gain a reproducible deployment pipeline and the ability to perform rapid rollback. Risk officers gain the model card evidence needed for model risk management frameworks. Any organisation deploying more than one AI model to production should treat this pattern as a foundation — it is a prerequisite for shadow deployment (EAAPL-MDL002), canary release (EAAPL-MDL003), and rollback (EAAPL-MDL004).


2. Problem Statement

2.1 Business Problem

Product and risk teams cannot determine which model version is serving a given business function. When a customer complaint or regulatory inquiry arrives, the response is "we think it was the model from Q3" — an answer that destroys regulator confidence and exposes the organisation to material censure.

2.2 Technical Problem

Model artefacts (weights, configuration, evaluation results) are stored in ad-hoc locations — cloud storage buckets without naming conventions, local developer machines, CI/CD pipeline caches. Versions are identified by timestamps or commit hashes that carry no semantic meaning. There is no link between a deployed model and the training run, dataset, or evaluation that produced it.

2.3 Symptoms

  • Incident post-mortems cannot identify which model version caused the issue.
  • Re-running training with "the same" code and data produces a different model with no explanation.
  • Multiple teams deploy models with the same name but different capabilities, causing downstream consumer confusion.
  • Evaluation results exist in spreadsheets disconnected from the deployed artefact.
  • Deprecation is ad-hoc: consumers discover a model has been retired when their API calls start failing.

2.4 Cost of Inaction

Category Indicative Impact
Regulatory Inability to respond to EU AI Act Article 13 transparency or APRA CPS234 audit requests — potential licence or operating conditions
Operational 4–8 hour incident resolution time (vs <30 minutes with lineage) when model defect causes production incident
Reputational Customer or media exposure of uncontrolled AI deployment practices
Financial Re-training cost when original training run is unrecoverable — typically $10K–$500K for large foundation fine-tunes

3. Context

3.1 When to Apply

  • Any organisation deploying one or more AI/ML models to production systems.
  • Teams preparing for regulatory examination under EU AI Act, APRA CPS234, or equivalent.
  • Organisations implementing a Model Risk Management framework.
  • Prior to implementing shadow deployment, canary release, or ensemble patterns.

3.2 When NOT to Apply

  • Pure research/experimentation environments with no production deployment path (apply lightweight notebook versioning instead).
  • Models trained and discarded within a single batch pipeline with no reuse intent.

3.3 Prerequisites

Prerequisite Detail
Artefact storage infrastructure Immutable object storage (S3/GCS/Azure Blob) with versioning enabled
Model Register (EAAPL-GOV001) Central registry to receive model version registrations
CI/CD pipeline Automated pipeline that can invoke version registration on successful training/eval
Model Card template Organisational template for model documentation (per-version)

3.4 Industry Applicability

Industry Applicability Primary Driver
Financial Services Critical APRA CPS234, ASIC guidance on algorithmic accountability
Healthcare Critical TGA/FDA AI guidance, patient safety traceability
Government High APS AI Ethics Principles, FOI obligations
Retail / E-commerce Medium Product recommendation auditability, A/B governance
Technology Medium SOC 2 Type II, enterprise customer audit requirements
Manufacturing High ISO 9001 change control adapted to AI components

4. Architecture Overview

4.1 Versioning Schema

Model versioning in an enterprise context adapts semantic versioning (SemVer) to the unique characteristics of machine learning models. A version takes the form MAJOR.MINOR.PATCH[-suffix].

MAJOR version increment signals an architecture change that breaks backward compatibility for consumers. Examples: replacing a transformer encoder with a decoder-only architecture; switching from a BERT-class model to a GPT-class model; changing the output schema (e.g., adding a confidence distribution to a previously point-estimate model). Consumers must update their integration when MAJOR changes.

MINOR version increment signals a significant capability change that is backward-compatible at the API level. Examples: fine-tuning on a new domain dataset that meaningfully shifts output distribution; retraining with 12 additional months of data; updating the base model (e.g., from GPT-4o to GPT-4.1) while preserving the same prompt interface. Consumers should validate that downstream quality expectations are still met.

PATCH version increment signals a performance or efficiency change that does not alter model behaviour materially. Examples: INT8 quantisation of weights; ONNX export optimisation; vocabulary trimming; prompt template correction for a minor wording issue. Consumers are expected to accept patch upgrades without re-validation.

A suffix carries pre-release or environment signals: -alpha, -beta, -rc.1, -shadow (currently in shadow testing per EAAPL-MDL002), -deprecated.

4.2 Artefact Bundle

Every model version consists of a versioned artefact bundle — a single, immutable unit of content that travels together through the pipeline. The bundle contains:

  • Weights: serialised model weights in a standard format (SafeTensors preferred; ONNX for cross-framework deployments).
  • Configuration: model architecture config, hyperparameters, tokeniser vocabulary, and special tokens map.
  • Preprocessing code: the exact pre/post-processing code used during training and inference, pinned by commit hash.
  • Evaluation results: structured JSON/YAML file containing all evaluation metric values, benchmark dataset references, and evaluation date.
  • Model card: structured Markdown document (see Section 4.3) authored and reviewed for this version.
  • Provenance manifest: training run ID, training data references (dataset name + version + hash), compute environment (hardware type, framework versions, OS).

The bundle is stored as an immutable archive in object storage with a content-addressable hash. The hash is registered in EAAPL-GOV001 — any tampering is detectable.

4.3 Model Card Standards

Each version requires a model card authored by the team and reviewed by the AI governance function. The model card captures: intended use (primary use case, out-of-scope uses explicitly listed); training data description (provenance, size, temporal range, known biases); evaluation results (quantitative metrics on held-out test set, fairness metrics by demographic subgroup where applicable); known limitations; ethical considerations and risk rating; approval status and approver identity.

4.4 Approval Workflow

No model version may be promoted to a production environment without approval. The approval workflow is triggered by CI/CD on successful evaluation. Low-risk models require sign-off from the model owner and a peer reviewer. High-risk or regulated models additionally require sign-off from the AI Governance function and, for APRA-regulated entities, the Chief Risk Officer or delegate. Approval is recorded in EAAPL-GOV001 with timestamp and approver identity.

4.5 Deprecation Management

Deprecation follows a structured lifecycle: (1) Deprecation notice issued to all registered consumers at least 90 days before sunset for production models, 30 days for non-production. (2) Traffic migration plan agreed with consumers, including a canary ramp schedule per EAAPL-MDL003. (3) Sunset date recorded in the model register. (4) On sunset, the model is removed from serving infrastructure but the artefact bundle remains in cold storage for the regulatory retention period (minimum 7 years for regulated decisions).


5. Architecture Diagram

ARCHITECTURE DIAGRAM
flowchart TD subgraph Build["Build and Bundle"] A[Training Run] B[Evaluation Suite] C[Artefact Bundle] end subgraph Register["Model Register"] D[(Immutable Object Store)] E[Model Register Entry] F[Model Card] end subgraph Lifecycle["Deployment Lifecycle"] G{Approval Workflow} H[Shadow + Canary] I[Live Production] end A --> B B -->|pass| C B -->|fail| J[Reject Version] C --> D D --> E E --> F F --> G G -->|approved| H G -->|rejected| J H --> I I -->|deprecation notice| K[Archive to Cold Storage] style A fill:#dbeafe,stroke:#3b82f6 style B fill:#f0fdf4,stroke:#22c55e style C fill:#f0fdf4,stroke:#22c55e style D fill:#fef9c3,stroke:#eab308 style E fill:#fef9c3,stroke:#eab308 style F fill:#fef9c3,stroke:#eab308 style G fill:#f3e8ff,stroke:#a855f7 style H fill:#f0fdf4,stroke:#22c55e style I fill:#d1fae5,stroke:#10b981 style J fill:#fee2e2,stroke:#ef4444 style K fill:#fef9c3,stroke:#eab308

6. Components

Component Type Responsibility Technology Options Criticality
Artefact Store Infrastructure Immutable, versioned storage for model bundles AWS S3 (Object Lock), Azure Blob (Immutable), GCS Critical
Model Register Platform Service Central registry indexing all model versions, metadata, approval status MLflow Model Registry, Weights & Biases, custom DB Critical
CI/CD Pipeline Automation Orchestrates training, evaluation, bundling, and registration GitHub Actions, GitLab CI, Jenkins, Argo Workflows High
Evaluation Harness Platform Service Runs benchmark suite, computes metrics, gates registration on pass/fail Pytest + custom harness, Eleuther LM Eval, custom Critical
Approval Workflow Governance Service Routes approval requests, captures sign-offs, enforces role-based approvals Jira, ServiceNow, custom approval microservice Critical
Model Card Generator Tooling Produces structured model card from training metadata + eval results Custom template renderer, Hugging Face model cards High
Deprecation Notifier Automation Sends deprecation notices to registered consumers on schedule Email/Slack webhook, event bus Medium
Version Comparison UI Tooling Side-by-side view of metrics, architecture diff, model card diff between versions MLflow UI, custom dashboard, Grafana Medium

7. Data Flow

7.1 Primary Flow

Step Actor Action Output
1 Training Pipeline Executes training run with pinned dataset and code commit Trained model weights + training run metadata
2 Evaluation Harness Runs benchmark suite against held-out test set Evaluation results JSON
3 CI/CD Pipeline Assembles artefact bundle (weights + config + preprocessing + eval) Versioned artefact bundle with content hash
4 CI/CD Pipeline Uploads bundle to immutable object store Storage URI + content-addressable hash
5 Registration Agent Registers version in Model Register with metadata and artefact reference Model Register entry with version ID
6 Model Card Service Generates draft model card from training + eval metadata Draft model card document
7 Model Owner Reviews and completes model card, submits for approval Completed model card + approval request
8 Approval Workflow Routes to appropriate approvers based on model risk tier Approved or rejected version record
9 Deployment Service On approval, makes version available in serving infrastructure Version active in target environment

7.2 Error Flow

Error Scenario Detection Recovery Action
Evaluation below threshold Evaluation harness fail gate CI/CD pipeline rejects bundle; version not registered; team alerted
Artefact upload failure Storage client exception Pipeline retries 3x then raises incident; training run marked failed
Content hash mismatch on verify Model Register integrity check Version quarantined; security event raised; bundle re-uploaded
Approval workflow timeout Workflow SLA monitor (72h) Escalation to model owner's manager; version stays in pending state
Deployment to wrong environment Environment tag validation pre-deploy Deployment blocked; operator notified; approval re-confirmed required

8. Security Considerations

8.1 Controls Summary

Domain Control
Authentication Pipeline service accounts use short-lived OIDC tokens; no long-lived credentials in CI
Authorisation Model Register write access limited to CI/CD service accounts; human read-only by default
Secrets Model vendor API keys in secrets manager (AWS Secrets Manager / Azure Key Vault); not in artefact bundle
Classification Model artefacts classified at minimum INTERNAL; fine-tuned-on-sensitive-data models classified CONFIDENTIAL
Encryption Artefacts encrypted at rest (AES-256); in-transit TLS 1.3; encryption keys in KMS
Auditability All Model Register mutations logged to immutable audit log; log retention 7 years

8.2 OWASP LLM Top 10 Relevance

OWASP LLM Risk Relevance to This Pattern Mitigation
LLM01 Prompt Injection Low Versioning layer does not process prompts; mitigation is upstream in inference layer
LLM02 Insecure Output Handling Low Model cards are rendered in controlled internal tooling; output escaping applied
LLM03 Training Data Poisoning High Provenance manifest in bundle enables detection of poisoned dataset versions; hash check
LLM04 Model Denial of Service Low Versioning layer is metadata; DoS risk is in serving layer (EAAPL-INF001)
LLM05 Supply Chain Vulnerabilities High All base models registered with provenance; third-party model licence and source verified
LLM06 Sensitive Information Disclosure Medium Model cards must NOT contain training data samples; PII must be absent from evaluation results
LLM07 Insecure Plugin Design Low Not applicable to versioning pipeline
LLM08 Excessive Agency Low Approval workflow enforces human-in-loop before any production promotion
LLM09 Overreliance Medium Version history and model card limitations section counters overreliance risk by documenting known failure modes
LLM10 Model Theft High Artefact store access logging; export controls on CONFIDENTIAL models; access review quarterly

9. Governance Considerations

9.1 Responsible AI

Every model version requires a completed model card that addresses: fairness evaluation results (disaggregated by relevant subgroups), intended and prohibited use cases, known biases, and human oversight requirements. The AI Governance function reviews model cards for high-risk models before approval.

9.2 Model Risk Management

Model versioning is the foundation of the organisation's Model Risk Management framework. Each version is a distinct model for MRM purposes. Version MAJOR increments trigger full model validation. MINOR increments trigger targeted validation against changed capability. PATCH increments require regression testing evidence only.

9.3 Human Approval Gates

No automated process may promote a model to production without a recorded human approval. The approval record must include: approver identity (not a service account), approval date, version being approved, risk tier assessment, and any conditions or exceptions noted.

9.4 Governance Artefacts

Artefact Owner Frequency Location
Model Card Model Owner Per version Model Register + artefact bundle
Approval Record Approval Workflow Per version Governance Management System
Deprecation Notice Model Owner Per deprecation Model Register + consumer notification
Artefact Integrity Report Security Operations Monthly SIEM / audit log
Model Inventory (all active) AI Governance Quarterly EAAPL-GOV001 register export

10. Operational Considerations

10.1 SLOs

SLO Target Measurement Method
Version registration latency < 5 minutes CI/CD pipeline timing from bundle upload to registry entry
Approval workflow response time < 72 hours Workflow system timestamp from submission to decision
Artefact integrity verification 100% on deploy Hash verification in deployment pre-check
Deprecation notice lead time ≥ 90 days (prod) Model Register deprecation date minus notice date

10.2 Monitoring and Logging

The Model Register emits structured events on: version registration, approval state change, deployment, deprecation notice, and sunset. Events are forwarded to the organisation's SIEM. Alerts are configured for: unapproved versions deployed to production (P1), artefact hash mismatch (P1), overdue approval (P2).

10.3 Incident Response

A model-version incident (wrong version in production, unapproved version detected) triggers the AI Incident Response playbook. The immediate action is traffic shift to the last known-good approved version per EAAPL-MDL004. Root cause analysis is mandatory and must be completed within 5 business days.

10.4 Disaster Recovery

Scenario RPO RTO Recovery Procedure
Model Register database failure 1 hour 4 hours Restore from hourly snapshot; cross-region replica failover
Artefact store unavailability 0 (immutable) 1 hour Failover to secondary region replica; artefacts pre-replicated
Approval workflow system down N/A 2 hours Emergency approval via documented out-of-band process

10.5 Capacity Planning

Model artefact bundles range from 500 MB (small fine-tuned models) to 150 GB (large foundation model fine-tunes). Storage growth is predictable: estimate 3–5 new versions per model per quarter. Retention policy: active versions indefinitely; deprecated versions 7 years cold storage. Plan for 2–5 TB/year of new artefact storage for a 10-model portfolio.


11. Cost Considerations

11.1 Cost Drivers

Driver Description Relative Impact
Artefact storage Object storage for versioned bundles; cold tier for deprecated versions Medium
Model Register infrastructure Database + API hosting for registry service Low
Evaluation compute GPU/CPU time to run benchmark suite on each new version Medium-High
Engineering time Model card authoring, approval coordination, pipeline maintenance High
Governance tooling Workflow system licences or custom development Low-Medium

11.2 Scaling Risks

Evaluation compute is the primary scaling risk. As model size grows, full benchmark evaluation can take hours and require expensive GPU instances. Organisations must plan evaluation infrastructure as part of training budget, not as an afterthought.

11.3 Optimisations

  • Use spot/preemptible instances for evaluation runs (evaluation is restartable).
  • Tier evaluation depth by risk: PATCH versions run a reduced evaluation suite (subset of benchmarks); MAJOR versions run the full suite including adversarial tests.
  • Implement incremental evaluation: cache results for unchanged benchmark subsets.
  • Cold-tier deprecated artefacts immediately on sunset (typically 80% cost reduction vs hot storage).

11.4 Indicative Cost Range

Organisation Scale Monthly Cost Range Key Assumptions
Small (1–5 models, <10 versions/month) $500–$2,000 2 TB storage, modest evaluation compute, SaaS registry
Medium (5–20 models, 10–50 versions/month) $2,000–$10,000 10 TB storage, dedicated eval GPU nodes, custom registry
Large (20+ models, 50+ versions/month) $10,000–$50,000 50+ TB storage, continuous eval infrastructure, enterprise tooling

12. Trade-Off Analysis

12.1 Versioning Schema Options

Option Pros Cons Best For
Semantic versioning (this pattern) Communicates semantic meaning; widely understood; compatible with existing tooling Requires team discipline to classify changes correctly Organisations with model governance maturity
Date-based versioning (YYYY.MM.DD) Simple; automatically sortable; no classification burden Carries no semantic meaning; cannot infer compatibility Rapid experimentation environments
Hash-only (content-addressable) Guarantees uniqueness; no classification bias Completely opaque to humans; poor for communication Internal pipeline references only
Monotonic integer (v1, v2…) Simple; no ambiguity about ordering No semantic meaning; requires changelog for all changes Small teams, low model count

12.2 Architectural Tensions

Tension Description Resolution
Rigour vs Speed Approval workflows slow deployment velocity; teams under pressure skip steps Tiered approval: PATCH = 1 approver, 24h SLA; MAJOR = board-level, 5-day SLA
Completeness vs Storage Cost Storing full model bundles for all versions is expensive Tiered storage: recent versions hot, deprecated cold; incremental weight storage where feasible
Standardisation vs Flexibility Different model types (classical ML, LLMs, vision) have different artefact structures Core bundle schema is extensible; type-specific fields added as extensions

13. Failure Modes

Failure Likelihood Impact Detection Recovery
Unapproved version deployed Low Critical Model Register audit log alert Immediate traffic rollback; incident declared; post-mortem
Artefact bundle corruption Very Low High Hash verification on deploy Redeploy prior known-good version; re-upload from backup
Evaluation harness misconfiguration Medium High Metrics drift alert; human review Quarantine affected versions; re-evaluate with corrected harness
Model Register unavailability Low Medium Health check monitor Emergency access to last-known artefact reference; restore from backup
Version classification error (wrong MAJOR/MINOR/PATCH) Medium Medium Consumer integration test failure Increment version correctly; re-issue; notify consumers
Model card not updated for version Medium Medium Governance review checklist gate Block promotion until model card complete and approved

13.1 Cascading Failure Scenarios

If the Model Register becomes unavailable, deployment pipelines lose the ability to verify approved status of model versions. Without a circuit breaker, pipelines may fall back to deploying unverified versions. Mitigation: deployment pipelines cache the last-known approved version manifest locally with a 24-hour TTL; any deploy during register outage uses the cached manifest and triggers a P2 incident for human review.


14. Regulatory Considerations

Regulation / Framework Relevant Clause / Requirement How This Pattern Addresses It
EU AI Act (2024/1689) Article 9 (Risk Management System), Article 13 (Transparency), Article 17 (Quality Management) Versioned model cards provide Article 13 documentation; approval workflow is Article 9 risk control; artefact bundle supports Article 17 quality records
ISO 42001:2023 Clause 8.4 (AI system lifecycle), Clause 9.1 (Performance evaluation) Version schema maps to lifecycle stages; evaluation results in bundle satisfy Clause 9.1
NIST AI RMF (2023) GOVERN 1.1, MANAGE 1.1, MEASURE 2.5 Governance artefacts satisfy GOVERN; version history enables MANAGE rollback; eval results satisfy MEASURE
APRA CPS 234 (2019) Paragraph 15 (Information security policy), Paragraph 36 (Notification) Artefact integrity controls and audit log satisfy Paragraph 15; incident notification process covers Paragraph 36
APRA CPS 230 (2025) Paragraph 52 (Change management) Approval workflow and versioning schema constitute a formal change management process for AI models
Privacy Act 1988 (Cth) APP 11 (Security of personal information) Provenance manifest records that training data has undergone PII removal; supports APP 11 accountability

15. Reference Implementations

15.1 AWS

  • Artefact Store: S3 with Object Lock (COMPLIANCE mode) + S3 Intelligent-Tiering for cost.
  • Model Register: Amazon SageMaker Model Registry (native versioning + approval workflow).
  • Evaluation: SageMaker Processing Jobs for benchmark runs on managed GPU instances.
  • Approval Workflow: SageMaker Model Registry approval + SNS notifications to approvers.
  • Audit Log: CloudTrail + S3 (immutable log bucket).

15.2 Azure

  • Artefact Store: Azure Blob Storage with Immutability Policies + Lifecycle Management.
  • Model Register: Azure Machine Learning Model Registry.
  • Evaluation: Azure ML Pipelines on GPU compute clusters.
  • Approval Workflow: Azure ML + Power Automate for approval routing.
  • Audit Log: Azure Monitor + Log Analytics Workspace.

15.3 GCP

  • Artefact Store: Cloud Storage with Object Retention + Nearline/Coldline tiers.
  • Model Register: Vertex AI Model Registry.
  • Evaluation: Vertex AI Pipelines on A2/T4 instances.
  • Approval Workflow: Vertex AI Model Registry approval states + Cloud Functions for routing.
  • Audit Log: Cloud Audit Logs + BigQuery export.

15.4 On-Premises / Hybrid

  • Artefact Store: MinIO (S3-compatible) with WORM buckets or NetApp StorageGRID.
  • Model Register: MLflow Tracking Server (self-hosted, PostgreSQL backend).
  • Evaluation: Kubernetes Jobs on GPU node pools.
  • Approval Workflow: Jira Service Management or custom workflow microservice.
  • Audit Log: Elasticsearch + Kibana or Splunk.

Pattern ID Pattern Name Relationship Type Description
EAAPL-MDL002 Shadow Model Deployment Depends On Shadow deployment consumes versioned model bundles and records comparison against a specific version ID
EAAPL-MDL003 Canary Model Release Depends On Canary release requires semantic version to determine promotion/rollback decisions
EAAPL-MDL004 Model Rollback Depends On Rollback procedure targets a specific previous version by version ID
EAAPL-MDL006 Fine-Tuning Pipeline Produces Fine-tuning pipeline is the primary producer of new MINOR version increments
EAAPL-MDL007 Model Compression and Optimisation Produces Compression/optimisation produces new PATCH version increments
EAAPL-GOV001 Model Register Foundational This pattern registers into and reads from the Model Register

17. Maturity Assessment

Overall Maturity: Mature

Dimension Score (1–5) Rationale
Industry Adoption 5 Semantic versioning for ML models is well-established industry practice
Tooling Availability 5 Native support in SageMaker, Azure ML, Vertex AI, MLflow
Standards Alignment 4 Aligns with ISO 42001 and NIST AI RMF; model-card standards still evolving
Implementation Complexity 1 (low) Low-complexity process pattern; tooling is mature and well-documented
Regulatory Acceptance 4 Accepted by APRA, EU AI Act compliance approaches; specific clauses still being interpreted

18. Revision History

Version Date Author Summary of Changes
1.0 2026-06-12 Enterprise AI Architecture Practice Initial publication
← Back to LibraryMore Model Management