EAAPLEnterprise AI Architecture Pattern Library
EAAPLLibraryAI SecurityEAAPL-SEC003
EAAPL-SEC003Proven
⇄ Compare

Model Isolation

🔐 AI SecurityAPRA CPS234EU AI Act

[EAAPL-SEC003] Model Isolation

Category: Security / Compute Isolation Sub-category: Blast Radius Limitation Version: 1.2 Maturity: Proven Tags: isolation sandboxing network-segmentation process-isolation resource-quotas egress-control zero-trust Regulatory Relevance: APRA CPS234 §21, EU AI Act Art. 9, ISO 27001 A.13.1, NIST AI RMF MANAGE 2.2


1. Executive Summary

Model Isolation defines the architectural pattern for constraining the execution environment of AI models — whether hosted internally or accessed via API — to limit the blast radius of a model compromise, data exfiltration attempt, or misconfigured model workload. It treats every model execution environment as a potential attack surface and applies defence-in-depth isolation controls at the network, process, storage, and identity layers.

For executives, the business case is straightforward: AI models represent a new class of compute workload with unique risk characteristics. Unlike a web server that executes deterministic code, an LLM or ML model can be manipulated by adversarial inputs to produce unexpected outputs, access data it should not access, or generate outputs that cause downstream harm. If a model execution environment is not properly isolated, a compromised model can become a pivot point for lateral movement across the enterprise network, access secrets or databases it should never reach, or exfiltrate data through its outputs.

This pattern is especially critical for organisations running on-premises model inference, fine-tuning workloads, or AI agents with tool access. It is equally relevant as a design requirement when evaluating model hosting vendors: the vendor's isolation architecture should be reviewed against this pattern before procurement decisions.


2. Problem Statement

Business Problem

Enterprises running AI model workloads face a novel risk: a model serving network requests is a stateful, long-running process that processes inputs from potentially adversarial sources. Unlike a stateless API endpoint, a model's behaviour can be influenced by its inputs in complex ways. If the model process runs with broad network access, file system access, or cloud IAM permissions, a successful adversarial input or model misconfiguration can lead to data exfiltration, lateral movement, or privilege escalation.

Technical Problem

AI model serving processes — whether Python (PyTorch, Transformers, vLLM), Go-based inference servers, or containerised model endpoints — typically run with more permissions than required:

  • Network access to all subnets (allowing lateral movement if compromised).
  • File system access to model weights, configuration, and sometimes application data.
  • Cloud IAM roles with broad permissions (inherited from the host's instance profile).
  • Outbound internet access (a model can exfiltrate data through HTTP calls in a tool-enabled agentic context).
  • No resource quotas (a single runaway inference job can starve other workloads).

Symptoms

  • Model serving processes running as root or with excessive IAM permissions.
  • No network segmentation between model servers and sensitive data stores.
  • Model weights stored on writable file systems (enabling weight poisoning).
  • No egress controls on model serving infrastructure.
  • Unrestricted resource usage by individual inference jobs.

Cost of Inaction

Dimension Impact
Security Compromised model server enables lateral movement to databases, secrets stores, or cloud control plane
Data Adversarial inputs causing model to exfiltrate data from its context through tool calls or logging
Regulatory APRA CPS234 requires controls commensurate with information security risks — ungoverned model execution is a cited gap
Operational Runaway inference jobs cause resource exhaustion and service degradation for other workloads
Financial Unrestricted egress from model servers can lead to data exfiltration costs and regulatory fines

3. Context

When to Apply

  • Any on-premises or cloud-hosted AI model inference workload.
  • AI agents with tool access (the consequences of a compromised agent are significantly higher than a passive model).
  • Fine-tuning workloads that process proprietary or sensitive training data.
  • Multi-tenant AI platforms where multiple teams or customers share model infrastructure.
  • RAG systems where the model has access to a document retrieval layer.

When NOT to Apply

  • External API-only model usage (Azure OpenAI, Anthropic Claude via API) — isolation is the provider's responsibility; however, this pattern informs the contractual and audit questions to ask the provider.
  • Single-developer local experimentation environments.

Prerequisites

Prerequisite Detail
Container/VM infrastructure Kubernetes, ECS, or VM-based deployment required for isolation controls
Network segmentation capability VPC/VNet subnetting; security groups or network policies
Secrets management Vault or cloud-native secrets manager for model credentials
IAM maturity Ability to create fine-grained service accounts/roles for model workloads
Monitoring stack Process-level and network-level monitoring for anomaly detection

Industry Applicability

Industry Applicability Key Driver
Financial Services Critical Data sovereignty; lateral movement risk to core banking systems
Healthcare Critical Patient data protection; PHI access controls
Government / Defence Critical Classified data segregation; adversarial threat model
Technology / SaaS High Multi-tenant isolation; intellectual property protection
Manufacturing / Industrial High OT/IT boundary protection; model access to operational data
Retail Medium PII protection; model access to customer data stores

4. Architecture Overview

Model isolation is implemented as a set of concentric isolation boundaries — each layer reduces the blast radius of a compromise at the layer above. The architecture philosophy is: assume the model is compromised; design the environment so that a compromised model cannot reach anything of value.

Network Isolation

Model serving workloads are deployed in a dedicated, isolated network segment (VPC subnet, Kubernetes namespace with NetworkPolicy, or dedicated VLAN). This segment has no direct connectivity to:

  • Core databases (customer data, financial records).
  • Secrets stores (vault, secrets manager).
  • Internal corporate networks.
  • Internet (unless explicitly permitted by egress policy).

Inbound traffic reaches the model only from the AI Gateway (EAAPL-SEC001) via a specific port. All other inbound traffic is denied. Outbound traffic is restricted to: the model registry (to fetch weights), the telemetry endpoint (to ship metrics/logs), and explicitly allowlisted tool endpoints (for agentic use cases). A DNS sinhole or DNS firewall prevents the model from resolving arbitrary internet hostnames.

Process Isolation

Model serving processes run with the minimum operating system permissions required:

  • Non-root user (UID 1000+).
  • Read-only root filesystem (model weights and configuration are mounted read-only).
  • No CAP_SYS_ADMIN or other privileged Linux capabilities.
  • Seccomp profile restricting available system calls to those required for inference.
  • AppArmor or SELinux policy enforcing the process's access to file system paths.
  • No access to the host network namespace (container network only).

In Kubernetes, this is implemented via a PodSecurityPolicy (or Pod Security Admission in modern Kubernetes) with runAsNonRoot: true, readOnlyRootFilesystem: true, allowPrivilegeEscalation: false, and a custom seccomp profile.

Resource Quotas

Runaway inference jobs cause denial of service. Resource quotas are enforced at:

  • Container level: CPU requests/limits, memory requests/limits, GPU memory limits.
  • Kubernetes namespace level: ResourceQuota objects limiting total CPU, memory, and GPU across all pods.
  • Per-request level: token limits enforced by the serving layer (vLLM max_tokens, TGI max_new_tokens).

Resource quotas protect not only the model infrastructure but also adjacent workloads sharing the cluster.

Secret Access Minimisation

The model serving process requires no secrets beyond what is needed to authenticate to the model registry and emit telemetry. It does not hold database credentials, user API keys, or service-to-service tokens. Any secrets required are:

  • Injected at startup via a sidecar (Vault Agent, AWS Secrets Manager CSI driver) and expire after use.
  • Never stored in environment variables (accessible to any process in the container).
  • Never stored in the model's context window.

Read-Only Model Weights

Model weight files are mounted read-only. This prevents weight poisoning attacks (where an attacker with write access to the model's file system can modify weights to alter model behaviour). Weights are loaded from a signed, immutable artefact store (container registry with image signing, S3 with Object Lock) and verified at startup using a cryptographic hash.

Egress Controls for Agentic Systems

For AI agents with tool access, egress is the highest-risk attack surface. A compromised agent can use legitimate tool calls to exfiltrate data. Egress controls implement:

  • An explicit tool endpoint allowlist enforced at the network layer (not just application layer).
  • Rate limits on tool call frequency to limit exfiltration bandwidth.
  • Deep packet inspection on HTTP tool calls (payload inspection for data exfiltration patterns).
  • Tool call audit logging to detect anomalous patterns.

5. Architecture Diagram

ARCHITECTURE DIAGRAM
flowchart TD subgraph External["External Zone"] A[AI Gateway] end subgraph ModelZone["Isolated Model Zone"] B[Network Policy] C[Inference Process] D[Secret Injector Sidecar] E[Resource Quota] end subgraph Egress["Allowlisted Egress"] F[Model Registry] G[Telemetry Endpoint] end subgraph Denied["Denied Zone"] H[Databases + Secrets] end A -->|mTLS inbound only| B --> C D -->|startup secrets only| C E -.->|enforces limits| C C --> F C --> G B -->|deny all other| H style A fill:#dbeafe,stroke:#3b82f6 style B fill:#f0fdf4,stroke:#22c55e style C fill:#fef9c3,stroke:#eab308 style D fill:#f0fdf4,stroke:#22c55e style E fill:#f0fdf4,stroke:#22c55e style F fill:#fef9c3,stroke:#eab308 style G fill:#fef9c3,stroke:#eab308 style H fill:#fee2e2,stroke:#ef4444

6. Components

Component Type Responsibility Technology Options Criticality
Network Policy Network Control Restricts inbound/outbound traffic for model serving pods to allowlisted endpoints Kubernetes NetworkPolicy, AWS Security Groups, Calico, Cilium Critical
Pod Security Controls Process Isolation Enforces non-root execution, read-only filesystem, capability restrictions, seccomp profile Kubernetes PodSecurity Admission, OPA Gatekeeper, Kyverno Critical
Secret Sidecar Secrets Management Injects required secrets at startup; rotates and expires credentials; never persists secrets to disk Vault Agent sidecar, AWS Secrets Manager CSI driver, Azure Key Vault CSI driver Critical
Resource Quota Resource Control Limits CPU, memory, GPU consumption per pod and per namespace Kubernetes ResourceQuota + LimitRange, Slurm (HPC), AWS Fargate resource limits High
Model Registry Artefact Store Stores model weights in immutable, signed artefacts; enforces content-addressed retrieval Docker Registry + Notary/cosign, S3 with Object Lock + SHA-256 manifest, MLflow Registry High
Weight Integrity Verifier Integrity Check Verifies cryptographic hash of model weights at container startup before serving begins Cosign, custom hash verification script in init container High
Egress Controller Network Control Enforces outbound connection allowlist; optionally performs deep packet inspection on tool calls Envoy egress proxy, Squid with allowlist, AWS VPC Endpoints, Cilium egress gateway High
Log Sidecar Observability Collects process logs, system call traces, and network connection logs; forwards to SIEM Fluentd, Fluent Bit, Datadog Agent, AWS FireLens High
Seccomp Profile OS Hardening Restricts Linux system calls available to the inference process Custom seccomp JSON profile, Docker default seccomp, Bottlerocket cgroups v2 Medium
AppArmor / SELinux Policy OS Hardening Mandatory access control enforcing file system and capability boundaries AppArmor (Ubuntu/Debian), SELinux (RHEL/Amazon Linux) Medium

7. Data Flow

Primary Flow

Step Actor Action Output
1 DevOps / MLOps Publishes model weights to model registry with cosign signature Signed, immutable model artefact with SHA-256 digest
2 Container Orchestrator Schedules model serving pod in isolated namespace; applies NetworkPolicy and PodSecurity constraints Pod scheduled on dedicated node pool with isolation labels
3 Init Container Fetches model weights from registry; verifies cosign signature and SHA-256 hash Verified weights mounted at read-only path
4 Secret Sidecar Authenticates to Vault using Kubernetes Service Account token; retrieves telemetry credentials; injects into shared memory Short-lived credentials available to inference process
5 Inference Process Starts serving; accepts inbound requests only from AI Gateway over mTLS Model ready to serve
6 AI Gateway Forwards validated, sanitised request to model Request received by inference process
7 Inference Process Runs inference; generates response; for agentic workloads, makes tool calls only to allowlisted endpoints Response or tool call output
8 Log Sidecar Collects process logs, resource metrics, and network connection events; forwards to telemetry endpoint Observability data available in SIEM/monitoring stack
9 Resource Quota Controller Enforces CPU/memory/GPU limits; throttles or terminates if limits exceeded Normal operation or throttle/OOMKill event

Error Flow

Error Condition Behaviour Alert
Weight integrity check fails Pod fails to start; alert MLOps team Critical: model weight integrity violation
Secret sidecar cannot authenticate to Vault Pod fails to start; no credentials available Critical: secret injection failure
Network policy violation attempt Connection rejected by Kubernetes NetworkPolicy; logged by Cilium/Calico Security: model attempting disallowed egress
Resource quota exceeded Pod throttled (CPU) or OOMKilled (memory); pod restarted Warning: resource exhaustion
Seccomp violation (blocked syscall) Process terminated with SIGSYS; pod restarted Security: unexpected syscall from model process

8. Security Considerations

Authentication & Authorisation

  • Model serving process has no inbound authentication to manage (auth handled by AI Gateway before request reaches model).
  • Outbound authentication for tool calls uses short-lived tokens injected by the secret sidecar — never long-lived credentials embedded in configuration.
  • Kubernetes Service Account tokens used for Vault authentication are bound to the specific pod's namespace and expire within 1 hour.

Secrets Management

  • No secrets in environment variables (visible in container inspect, logs, crash dumps).
  • No secrets in model weights or configuration files.
  • Secret sidecar injects credentials into in-memory tmpfs only.
  • All credential access logged by Vault for audit.

Data Classification

  • Model execution environment is classified at the sensitivity level of the highest-classification data it will process. A model serving requests containing CONFIDENTIAL data must be isolated in a CONFIDENTIAL-tier network segment.
  • Cross-classification boundary serving is prohibited — a model serving CONFIDENTIAL requests must not also serve PUBLIC requests (context window contamination risk).

Encryption

  • Model weights encrypted at rest in registry (AES-256, provider-managed key) and in transit (TLS 1.3 from registry to pod).
  • Network traffic within the pod is encrypted using Kubernetes pod-to-pod mTLS (Istio/Linkerd) or WireGuard (Cilium).
  • Scratch space (for intermediate computation) uses encrypted ephemeral volumes.

Auditability

  • All egress connection attempts (successful and blocked) logged with source pod, destination IP/hostname, and timestamp.
  • All secret access events logged by Vault.
  • All resource quota violations logged for security review (may indicate attempted resource exhaustion attack).

OWASP LLM Top 10 Coverage

OWASP LLM Risk Model Isolation Mitigation Coverage
LLM01: Prompt Injection Isolation limits blast radius if injection succeeds; does not prevent injection itself Low
LLM02: Insecure Output Handling Egress controls limit exfiltration of data through tool calls in agentic contexts High
LLM03: Training Data Poisoning Read-only model weights + weight integrity verification prevent weight-level poisoning post-deployment High
LLM04: Model Denial of Service Resource quotas prevent runaway inference from affecting other workloads High
LLM05: Supply Chain Vulnerabilities Signed model artefacts and integrity verification at startup prevent supply chain compromise of model weights High
LLM06: Sensitive Information Disclosure Network isolation prevents direct access to data stores; context window data cannot reach external endpoints High
LLM07: Insecure Plugin Design Egress allowlist enforces tool endpoint restrictions at network layer High
LLM08: Excessive Agency Egress controls and tool allowlist limit the actions an agent can take High
LLM09: Overreliance Not applicable None
LLM10: Model Theft Read-only filesystem; encrypted weights at rest; no external weight exfiltration path High

9. Governance Considerations

Responsible AI

  • Model isolation ensures that AI model behaviour is bounded — a model cannot access data beyond its authorised scope, which is a prerequisite for responsible deployment.
  • Isolation boundaries must be documented in the AI system's risk register and reviewed as part of the AI impact assessment process.

Model Risk Management

  • Isolation controls form a critical part of the model risk management framework: they limit the operational risk from a model behaving unexpectedly.
  • Weight integrity verification is a model risk control — it ensures the deployed model is the validated, approved model.

Human Approval

  • Changes to network policy (e.g., adding a new egress allowlist entry) require approval from Security Architecture and are subject to change management.
  • Changes to seccomp profiles or AppArmor policies require security team review.

Governance Artefacts

Artefact Owner Frequency Purpose
Model Isolation Design Document Security Architecture With each new model deployment Documents isolation controls for each model environment
Network Policy Audit Report Security Operations Quarterly Verifies network policies are correctly applied and not bypassed
Weight Integrity Verification Log MLOps Continuous Evidence that deployed models match approved artefacts
Egress Connection Log Security Operations Continuous review Detects anomalous outbound connections from model serving
Resource Quota Review Platform Engineering Quarterly Ensures quotas are appropriate for workload without over-provisioning risk

10. Operational Considerations

Monitoring

  • Process-level: CPU, memory, GPU utilisation per inference process; seccomp violation events.
  • Network-level: egress connection attempts (blocked and permitted); inbound connection sources.
  • Storage-level: write attempts to read-only filesystem (apparmor/seccomp violation).
  • Resource-level: quota utilisation trends; OOMKill events.

SLOs

SLO Target Measurement
Weight integrity verification time <30s at pod startup Init container span
Secret injection latency <5s at pod startup Secret sidecar span
Network policy enforcement latency <1ms per connection Cilium/Calico metrics
Egress block alert latency <60s from connection attempt to alert Alert pipeline latency
Seccomp/AppArmor violation alert <30s from violation to SIEM SIEM ingestion latency

Logging

  • Structured JSON from all sidecars. Mandatory: pod_name, namespace, event_type (egress_attempt, seccomp_violation, oomkill, weight_integrity_check), outcome (allowed/blocked/failed), timestamp_utc.
  • Network connection logs include src_pod, dst_ip, dst_hostname, dst_port, protocol, bytes_transferred, outcome.

Incident Management

  • Egress connection attempt to non-allowlisted destination → P1 security incident; immediate pod isolation; security operations investigation.
  • Seccomp violation → P2; pod quarantined; security review of syscall.
  • Weight integrity failure → P1; pod does not start; MLOps escalation; artefact store integrity investigation.

DR

Scenario RTO Recovery
Pod OOMKilled 30s Kubernetes restarts pod; alert to platform team
Model registry unavailable 5min (new pods cannot start; existing pods continue) Cached weights in running pods; restore registry
Vault unavailable 2min (pods can't start or rotate secrets) Vault HA cluster; emergency credential cache in CSI driver
Network policy misconfiguration 5min Rollback network policy to last known-good version via GitOps

11. Cost Considerations

Cost Drivers

Cost Driver Description Relative Impact
Dedicated node pool Model workloads often require GPU nodes; isolation to dedicated pools prevents bin packing with other workloads High
Egress proxy Envoy or Squid egress proxy adds compute cost Low
Secret sidecar Vault Agent or CSI driver adds memory overhead per pod Low
Security scanning Image scanning, seccomp profile generation, AppArmor policy authoring engineering time Medium
GPU underutilisation Isolation prevents sharing GPU nodes with non-model workloads Medium–High

Optimisations

  • Use node affinity and taints to co-locate multiple isolated model workloads on the same GPU node while maintaining pod-level isolation — share the node's GPU, not the network or filesystem.
  • Implement GPU time-slicing (MIG on NVIDIA A100) to allow multiple isolated pods to share a single GPU without memory isolation risk.

Indicative Cost Range

Scale Monthly AWS Additional Cost (USD) Notes
Small (1–2 model endpoints) $200–$600 Dedicated EKS node group, NAT Gateway for egress control
Medium (5–20 model endpoints) $1,000–$4,000 Dedicated node pools; Cilium enterprise for egress; additional monitoring
Large (50+ model endpoints) $8,000–$25,000 Multi-tenant GPU cluster with fine-grained isolation; dedicated security tooling

12. Trade-Off Analysis

Option Comparison

Option Description Pros Cons Best For
A: Namespace-only isolation Separate Kubernetes namespace with NetworkPolicy; no process-level hardening Low operational overhead; fast to implement Process escapes still possible; shared kernel; no egress DPI Dev/staging environments; low-sensitivity workloads
B: Full pod hardening (this pattern) Namespace + process isolation (seccomp, AppArmor, non-root, read-only FS) + egress control Comprehensive isolation; industry-standard Requires seccomp profile authoring; AppArmor policy management; operational overhead Production AI workloads; regulated environments
C: VM-level isolation Each model in a dedicated VM (or Kata Containers for VM-level isolation in Kubernetes) Kernel isolation; strongest blast radius containment High cost; poor bin packing; slow start time Highest-risk workloads; multi-tenant with hostile tenants
D: Managed service isolation Use cloud-managed model serving (SageMaker, Azure ML, Vertex AI) and accept provider isolation Low operational burden; provider SLAs Vendor lock-in; less control; data residency constraints; can't customise seccomp Organisations without Kubernetes expertise

Architectural Tensions

Tension Trade-Off
Isolation vs Operability Strict seccomp profiles and read-only filesystems can break inference libraries that write temp files. Resolution: profile the inference process's system call requirements before writing the seccomp profile; use tmpfs for scratch space.
Performance vs Security Network policy enforcement (Cilium eBPF) and seccomp add per-request overhead. At high inference volumes, this can be measurable. Resolution: eBPF-based enforcement (Cilium) is near-zero-overhead; seccomp adds <1% CPU overhead for inference workloads.
GPU Sharing vs Isolation GPU memory isolation requires MIG (A100/H100 only); older GPUs share GPU memory between processes. Resolution: use MIG for production; accept soft isolation (process-level) for other GPU types.

13. Failure Modes

Failure Likelihood Impact Detection Recovery
Seccomp profile too restrictive (breaks inference library) Medium High (model unavailable) Pod CrashLoopBackOff; SIGSYS in logs Audit required syscalls; update seccomp profile; redeploy
Network policy rule error (legitimate traffic blocked) Medium High (model unreachable from gateway) 503 errors from gateway → model; network connectivity check Roll back network policy; investigate and fix
Weight integrity check false negative Very Low Critical Post-deployment model behaviour anomaly detection Forensic analysis of model registry; rolling restart from clean artefact
Secret sidecar certificate rotation failure Low High (credentials expire; model cannot authenticate for tool calls) Secret expiry metric approaching zero Sidecar restart; Vault token renewal
GPU memory isolation breach (non-MIG GPU) Low Medium (process memory accessible between pods) Process-level memory boundary monitoring Migrate to MIG-capable hardware; temporary: single-tenant GPU nodes

14. Regulatory Considerations

Regulation Requirement Model Isolation Implementation
APRA CPS234 §21 Information security controls commensurate with sensitivity Network and process isolation directly address information asset protection
APRA CPS234 §23 Capability to detect and respond to information security incidents Egress logging and violation alerting implement incident detection for model environments
EU AI Act Art. 9 (Risk Management) Implement technical and organisational measures to manage AI risks Model isolation is a core technical risk management measure for on-premises AI workloads
ISO 27001 A.13.1 (Network Security) Manage and control networks to protect information systems Network policy and egress control implement this requirement for AI workloads
ISO 27001 A.12.6 (Technical Vulnerability Management) Prevent exploitation of technical vulnerabilities Read-only filesystem and weight integrity verification address model-layer vulnerability management
NIST AI RMF MANAGE 2.2 Mechanisms exist to prevent improper access Isolation controls implement access prevention at network, process, and storage layers

15. Reference Implementations

AWS

Component AWS Service
Container isolation EKS with Bottlerocket OS (seccomp by default); OPA Gatekeeper for policy
Network isolation VPC subnets + Security Groups; EKS NetworkPolicy via Cilium or Calico
Egress control AWS Network Firewall; VPC Endpoints for AWS services (no internet path)
Process isolation Bottlerocket OS seccomp profiles; AWS Fargate (VM-level isolation)
Secret injection AWS Secrets Manager CSI driver; IAM Roles for Service Accounts (IRSA)
Weight storage ECR (OCI artefacts) with image signing (cosign); S3 with Object Lock
Resource quotas EKS ResourceQuota + LimitRange; NVIDIA GPU Operator for GPU quotas

Azure

Component Azure Service
Container isolation AKS with Azure Linux (CBL Mariner); Azure Policy for pod security
Network isolation AKS NetworkPolicy (Azure CNI or Calico); private AKS cluster
Egress control Azure Firewall with FQDN allow rules
Secret injection Azure Key Vault CSI driver; Workload Identity
Weight storage Azure Container Registry with Notation signing; Azure Blob with immutability
Resource quotas AKS ResourceQuota; Node Taints for GPU isolation

GCP

Component AWS Service
Container isolation GKE Autopilot (enforces security best practices by default); Workload Identity
Network isolation GKE NetworkPolicy; Private GKE cluster; VPC Service Controls
Egress control Cloud Armor; VPC firewall rules with FQDN
Secret injection Secret Manager CSI driver; Workload Identity Federation
Weight storage Artifact Registry with Binary Authorization

On-Premises

Component Technology
Container isolation Kubernetes with OPA Gatekeeper; custom seccomp profiles per model workload
Network isolation Calico or Cilium NetworkPolicy; dedicated VLAN per model tier
Egress control Envoy egress proxy with explicit upstream allowlist
Secret injection HashiCorp Vault Agent sidecar injector
Weight storage Harbor registry with Notary signing; Ceph S3 with WORM policies
GPU isolation NVIDIA MIG on A100; one MIG instance per isolated model workload

Pattern ID Relationship
AI Gateway EAAPL-SEC001 Gateway is the only permitted inbound path to the model; isolation enforces this at network layer
Secure Tool Invocation EAAPL-SEC004 Egress controls in model isolation are the network-layer enforcement of tool invocation policy
Zero-Trust AI Pipeline EAAPL-SEC007 Model isolation implements the compute-layer zero-trust controls within the broader pipeline
Secrets Management for AI EAAPL-SEC008 Secret injection sidecar pattern depends on SEC008 for the vault infrastructure
AI Telemetry EAAPL-OBS001 Log sidecar pattern provides the telemetry pipeline for model execution events
Adversarial Input Defence EAAPL-SEC010 Isolation limits blast radius of adversarial inputs that succeed in manipulating model behaviour

17. Maturity Assessment

Overall Maturity: Proven

Dimension Score (1–5) Rationale
Pattern definition clarity 4 Well-defined; some GPU-specific isolation guidance still evolving
Technology availability 4 Kubernetes + Cilium + OPA provides complete implementation; GPU MIG requires specific hardware
Industry adoption 3 Applied in security-mature organisations; underimplemented in most enterprises deploying AI
Operational tooling 4 Strong Kubernetes security tooling ecosystem
Regulatory alignment 4 Directly addresses CPS234, EU AI Act Art. 9 requirements
Community knowledge 4 Kubernetes security community well-documented; AI-specific extensions are newer

18. Revision History

Version Date Author Changes
1.0 2024-03-01 Security Architecture Team Initial pattern definition
1.1 2024-07-15 Security Architecture Team Added GPU MIG isolation guidance; updated OWASP LLM mapping
1.2 2025-02-01 Security Architecture Team Added weight integrity verification; updated regulatory mapping for EU AI Act
← Back to LibraryMore AI Security