EAAPLEnterprise AI Architecture Pattern Library
EAAPLLibraryAI SecurityEAAPL-SEC008
EAAPL-SEC008Proven
⇄ Compare

Secrets Management for AI

🔐 AI SecurityAPRA CPS234EU AI Act🏭 Field-tested in AU

[EAAPL-SEC008] Secrets Management for AI

Category: Security / Credential Management Sub-category: API Key and Token Lifecycle Version: 2.0 Maturity: Mature Tags: secrets-management api-keys vault dynamic-secrets rotation audit credential-hygiene Regulatory Relevance: APRA CPS234 §22, ISO 27001 A.9.4, NIST CSF PR.AC-1, EU AI Act Art. 9, SOC 2 CC6.1


1. Executive Summary

Secrets Management for AI addresses one of the most prevalent and consequential security failures in enterprise AI deployments: the mishandling of model API keys, service tokens, and credentials used by AI systems. Leaked or improperly managed API keys for commercial LLM providers (OpenAI, Anthropic, Azure OpenAI) give attackers the ability to generate costs at the organisation's expense, access data sent in API calls, and exfiltrate model outputs — with no attribution back to the attacker.

The business risk is material and immediate. A single leaked sk- OpenAI API key has been used to generate tens of thousands of dollars in API charges within hours of exposure. AI API keys embedded in mobile applications, JavaScript bundles, GitHub repositories, or CI/CD pipeline logs represent standing vulnerabilities that can be exploited at any time, from anywhere, with no prior access to the organisation's infrastructure.

This pattern establishes the complete lifecycle for AI credentials: vault storage and retrieval (no secrets in code or environment variables), dynamic secret generation where supported, automated rotation, granular access control, and comprehensive audit logging of every secret access event. For VITE_-prefixed or client-side secret exposure in particular, this pattern defines the architectural controls that prevent secrets from reaching user-accessible environments — a critical failure mode in frontend AI applications.


2. Problem Statement

Business Problem

AI model API keys are high-value credentials: they grant direct access to model inference, with billing charged to the owner. They are also widely mishandled — treated as configuration values rather than secrets, hard-coded in source code, embedded in build artifacts, and shared across teams without access controls. Every insecure placement is a potential financial and security exposure.

Beyond cost, AI API keys may be used to submit requests that the organisation would not authorise — generating harmful content, extracting information, or probing model capabilities in ways that create legal and reputational risk, with the charges appearing on the organisation's bill.

Technical Problem

Common failure modes in AI credential management:

  • Hard-coded in source code: OPENAI_API_KEY = "sk-..." in application code, committed to Git history.
  • Environment variables in containers: Accessible to any process in the container; visible in orchestrator logs; leaked in crash dumps.
  • VITE_-prefixed secrets in frontend builds: Build tools like Vite bundle VITE_* environment variables into client-side JavaScript, making them accessible to every user who loads the page.
  • Shared keys across environments: Same key used in dev, staging, and production — a leaked dev key gives production access.
  • No rotation: API keys never rotated; a key compromised 6 months ago may still be valid.
  • No audit: No record of which systems use which keys; key compromise cannot be scoped.
  • Broad-permission keys: Using an OpenAI "all models" key for an application that only needs GPT-3.5-turbo.

Symptoms

  • AI API keys in Git history (discoverable via git log -S "sk-").
  • VITE_OPENAI_API_KEY in production JavaScript bundles.
  • Unexplained spikes in model API spend (possible key misuse).
  • No centralised record of which applications hold which API keys.
  • Application secrets in CI/CD pipeline logs.
  • Keys with no expiry date and no rotation history.

Cost of Inaction

Dimension Impact
Financial Unauthorised API usage; $10K–$100K+ costs generated by a single leaked key
Security Attacker can query model with organisation's key; results and charges attributed to organisation
Regulatory APRA CPS234 requires access control for information assets — unmanaged API keys violate this
Data API call contents may be logged by provider — attacker using leaked key can submit data that appears in provider audit logs attributed to the organisation
Operational Key rotation requires emergency application redeployments; no rotation history means incident scoping is impossible

3. Context

When to Apply

  • Any application, service, or pipeline that holds credentials for AI model providers.
  • CI/CD pipelines that need model API access for testing or evaluation.
  • Frontend or mobile applications that call AI APIs — the frontend must NEVER hold model provider credentials directly.
  • Multi-environment deployments (dev/staging/prod) needing credential isolation.
  • Teams managing more than one AI model provider integration.

When NOT to Apply

  • Single-developer local development with credentials that never leave the developer's machine and are isolated to their personal API account.

Prerequisites

Prerequisite Detail
Vault Infrastructure HashiCorp Vault, AWS Secrets Manager, or Azure Key Vault
Identity Provider OIDC/SAML IdP for human user authentication to vault
Workload Identity Kubernetes Service Accounts or cloud-managed identity for application-level vault access
AI Gateway (EAAPL-SEC001) All AI API calls proxied through gateway; credentials held only by gateway

Industry Applicability

Industry Applicability Key Driver
All industries Critical Universal risk — AI API keys are credentials regardless of industry
Financial Services Critical APRA CPS234 explicit credential management requirements
Healthcare Critical Credentials protecting PHI-containing AI pipelines
Government Critical Classified system access controls
Technology / SaaS High Developer tooling and CI/CD credential exposure risk

4. Architecture Overview

The secrets management architecture for AI systems is built on a single foundational principle: model provider credentials must never be present outside the vault and the AI Gateway's runtime memory. Every other location — source code, environment variables, build artifacts, logs, client bundles — is prohibited.

Vault as Single Source of Truth

All AI credentials are stored in a centralised vault. The vault is the only place where credentials exist at rest. Applications do not store credentials — they request them at runtime. This separation is enforced through:

  • Architecture policy: no secret values in application configuration files or source code.
  • Automated scanning: CI/CD pipeline executes a secret-grep step that fails the build if any known secret pattern (OpenAI sk-, Anthropic sk-ant-, sbp_ for Supabase) appears in source files, build artifacts, or environment variable names with VITE_ prefix.
  • Vault access control: applications can only read the specific secrets they need, verified by workload identity.

Dynamic Secrets Where Supported

Some AI providers support dynamic or short-lived credential generation:

  • AWS Bedrock: Uses IAM roles via assume_role; credentials are time-limited by STS.
  • Azure OpenAI: Uses Azure Managed Identity; credentials retrieved from Azure AD, never static keys.
  • Anthropic, OpenAI: Do not currently support dynamic credentials — static API keys must be managed and rotated manually.

For providers requiring static keys, vault rotation is the control: vault stores the key, tracks its age, and triggers rotation workflows. Rotation involves: generating a new key via the provider's API, updating vault, confirming applications are using the new key, and revoking the old key.

Key Scoping and Least Privilege

Every AI application is issued credentials scoped to its specific requirements:

  • A customer service bot that only uses gpt-4o-mini is issued a key with rate limits and, where supported, model access restrictions.
  • A batch processing pipeline with no human interaction is issued a separate key from the real-time serving key — isolating blast radius if either is compromised.
  • Separate keys for dev, staging, and production environments. Dev key has the smallest spending limit; production key has the strictest access controls.

Frontend Architecture — The Critical Rule

Frontend applications (React, Vue, Angular) and mobile applications must NEVER hold model provider API keys. The technical reason: any value in a JavaScript bundle or mobile binary is publicly accessible — to every user, to security researchers, and to attackers. The architecture must be:

Browser/Mobile → Application Backend → AI Gateway → Model Provider

The application backend holds the session context, authenticates the user, enforces application-level access controls, and proxies requests to the AI Gateway (which holds provider credentials). The browser never sees a model provider API key.

Build systems that use VITE_, REACT_APP_, EXPO_PUBLIC_, or similar prefixes bundle the prefixed variable into client-side code. No secret should ever have these prefixes. CI/CD must fail on any VITE_OPENAI_, REACT_APP_ANTHROPIC_, or similar pattern.

Audit Logging

Every secret read from vault generates an immutable audit record: which application, which secret path, which accessor (human or workload identity), timestamp, and IP/network source. This enables:

  • Scoping of a credential compromise (which systems accessed the key in the last 90 days).
  • Detection of anomalous access patterns (secret accessed at 3am from an unusual IP).
  • Compliance evidence (APRA CPS234 requires audit trails for access to critical information assets).

5. Architecture Diagram

ARCHITECTURE DIAGRAM
flowchart TD subgraph Forbidden["Never Store Secrets Here"] A[Source Code + Bundles] B[CI Logs + Env Vars] end subgraph Vault["Secret Vault"] C[AI Provider Keys] D[Access Policies] E[Audit Log] end subgraph Runtime["Runtime Request Path"] F[Frontend] G[App Backend] H[AI Gateway] end F -->|session token| G G -->|internal request| H H -->|read at runtime| C D -->|policy check| H C --> E style A fill:#fee2e2,stroke:#ef4444 style B fill:#fee2e2,stroke:#ef4444 style C fill:#fef9c3,stroke:#eab308 style D fill:#f0fdf4,stroke:#22c55e style E fill:#fef9c3,stroke:#eab308 style F fill:#dbeafe,stroke:#3b82f6 style G fill:#dbeafe,stroke:#3b82f6 style H fill:#f0fdf4,stroke:#22c55e

6. Components

Component Type Responsibility Technology Options Criticality
Vault Secrets Store Authoritative store for all AI credentials; access control; audit logging HashiCorp Vault, AWS Secrets Manager, Azure Key Vault, GCP Secret Manager Critical
Vault Access Policies Policy Defines which workloads can access which secret paths HashiCorp Vault Policies, AWS IAM, Azure RBAC Critical
Dynamic Credential Provider Identity Generates time-limited credentials for supported providers (AWS Bedrock, Azure OpenAI) AWS STS, Azure Managed Identity, Vault AWS Secrets Engine High
Rotation Orchestrator Automation Automates key rotation: generate, update vault, confirm, revoke old Vault Agent rotation, AWS Secrets Manager auto-rotation, custom Lambda/Function High
Secret Grep CI Gate CI/CD Security Scans source code and build artifacts for secret patterns; fails build on detection TruffleHog, GitLeaks, detect-secrets, custom regex-based gate Critical
Frontend Secret Enforcer Build Security Fails build if VITE_*, REACT_APP_*, or similar prefix is applied to a secret Custom CI step, ESLint plugin for env var naming conventions Critical
Vault Audit Log Compliance Immutable, append-only record of all vault access events Vault Audit Backend → Kafka → S3 Object Lock Critical
Anomaly Detector Security Monitors vault access patterns for anomalous events Vault + Splunk/Datadog SIEM integration; custom access pattern alert High
Secret Version Manager Lifecycle Tracks secret versions; enables rollback; maintains rotation history Vault versioned KV, AWS Secrets Manager versioning High

7. Data Flow

Primary Flow

Step Actor Action Output
1 AI Gateway startup Authenticates to Vault using Kubernetes Service Account OIDC token Vault token with gateway's policy attached
2 AI Gateway Reads model provider API key from vault path /ai/providers/{provider}/keys/{application} API key in runtime memory; never written to disk
3 Vault Records audit event: gateway workload identity, secret path, timestamp Immutable audit record
4 AI Gateway Holds key in memory; sets timer for renewal before expiry (at 75% of lease TTL) Key available for model API calls
5 AI Gateway Makes model API call using key stored in memory; key value never appears in logs Successful model API call
6 Rotation Orchestrator At rotation schedule (or vault lease expiry): generates new key via provider API New key value
7 Rotation Orchestrator Writes new key to vault; updates version New version active in vault
8 AI Gateway On next renewal cycle: reads new key; begins using new key Seamless key rotation
9 Rotation Orchestrator After confirmation that all consumers have switched: revokes old key via provider API Old key invalidated

Error Flow

Error Handling Alert
Secret grep gate detects API key in source Build fails; PR blocked; developer notified Security: API key pattern in source code
VITE_* prefix on secret detected Build fails immediately Critical: client-side secret exposure risk
Vault unavailable at gateway startup Gateway fails to start; P1 alert Critical: secrets infrastructure unavailable
Vault lease expiry before renewal Gateway evicts expired key; new requests fail until key renewed High: key expiry
Rotation fails (provider API error) Alert rotation team; maintain old key temporarily; retry with backoff High: rotation failure
Anomalous vault access (unusual hours/IP) SIEM alert; potential credential investigation Security: anomalous credential access

8. Security Considerations

Authentication & Authorisation

  • Applications authenticate to Vault using workload identity (Kubernetes SA, AWS IAM role, Azure Managed Identity) — no human-visible credentials required to access vault.
  • Human operators access vault via SSO + MFA; break-glass access with dual approval and automatic alerting.
  • Vault policies grant access to specific secret paths only — an application cannot read secrets outside its namespace.

Secrets Management (Meta)

  • The vault itself requires protection: vault unseal keys are split using Shamir's secret sharing (e.g., 3-of-5) and distributed to security officers.
  • Vault root token is generated once, used to configure vault, and then deleted — normal operations use lower-privileged tokens only.

Data Classification

  • Secret paths in vault are classified: ai/providers/openai/prod-keys is RESTRICTED. Audit access to RESTRICTED paths generates alerts for any access outside business hours.

Encryption

  • All vault secrets encrypted at rest with AES-256-GCM; encryption keys managed by vault's built-in seal (or externally via AWS KMS, Azure Key Vault HSM).
  • All communication to vault over TLS 1.3.
  • Vault audit log: write-ahead log; each entry signed to detect tampering.

OWASP LLM Top 10 Coverage

OWASP LLM Risk Secrets Management Mitigation Coverage
LLM01: Prompt Injection Not applicable None
LLM02: Insecure Output Handling Not applicable None
LLM03: Training Data Poisoning Not applicable None
LLM04: Model Denial of Service Scoped keys with provider-side rate limits prevent unlimited model use Medium
LLM05: Supply Chain Vulnerabilities Vault-managed credentials prevent supply chain compromise of credential stores High
LLM06: Sensitive Information Disclosure API keys never in logs or client code eliminates a key exfiltration vector High
LLM07: Insecure Plugin Design Tool-specific credentials scoped per tool reduce blast radius Medium
LLM08: Excessive Agency Credential scoping limits what model provider capabilities an application can access Medium
LLM09: Overreliance Not applicable None
LLM10: Model Theft Credentials not exposed to end users; cannot be used to systematically query model High

9. Governance Considerations

Governance Artefacts

Artefact Owner Frequency Purpose
Secret Inventory Security Team Updated with each new integration Complete inventory of all AI credentials, their owners, and their rotation schedules
Rotation Schedule AI Platform Monthly review Ensures all static keys are on rotation schedule
Vault Access Audit Report Security Operations Monthly Identifies anomalous access patterns
CI Gate Violation Log DevSecOps Continuous Records all build failures due to secret exposure detection
Key Compromise Response Runbook Security Team Reviewed quarterly Step-by-step response to detected key compromise

10. Operational Considerations

SLOs

SLO Target Measurement
Secret retrieval latency (p99) <50ms Vault read latency metric
Key rotation success rate >99.9% Rotation job success/failure metric
Time from compromise detection to revocation <15min MTTD + MTTR for key compromise incidents
Vault availability 99.99% Vault health check uptime
Secret grep CI gate execution time <30s CI pipeline step timing

Incident Management

Key Compromise Response (15-minute target):

  1. Receive alert (automated detection or developer report).
  2. Immediately revoke key via provider API.
  3. Generate and vault new key.
  4. Verify all consumers have picked up new key (vault lease renewal cycle or forced restart).
  5. Scope the incident: review vault audit log for all accesses using the compromised key path; review model provider's usage logs for anomalous queries.
  6. File incident report; update runbook.

11. Cost Considerations

Cost Drivers

Cost Driver Description Relative Impact
Vault infrastructure HashiCorp Vault Enterprise or cloud-native equivalent Medium
Rotation automation Engineering for rotation workflows Medium (one-time)
CI secret scanning Adds 10–30s to build pipeline; negligible compute cost Very Low
Audit log storage Vault audit log grows with access volume Low

Indicative Cost Range

Scale Monthly Cost (USD) Notes
Small $200–$600 Cloud-native secrets manager (AWS Secrets Manager ~$0.40/secret/month + $0.05/10K API calls)
Medium $800–$2,500 HashiCorp Vault Enterprise (or HCP Vault)
Large $3,000–$10,000 HashiCorp Vault Enterprise; dedicated HSM for seal; multi-region HA

12. Trade-Off Analysis

Option Comparison

Option Description Pros Cons Best For
A: Environment variables only Secrets in container env vars Simple; widely supported Visible in orchestrator; leaked in crash dumps; no rotation Development only — never production
B: Cloud-native secrets manager AWS Secrets Manager / Azure Key Vault Managed; auto-rotation supported; low operational overhead Vendor lock-in; per-secret cost at scale Cloud-committed; small–medium secret count
C: HashiCorp Vault (this pattern) Self-hosted or HCP Vault Full-featured; dynamic secrets; multi-cloud; FIPS 140-2 Operational complexity; self-hosted has ops burden Enterprise; regulated; multi-cloud
D: CI/CD secret injection only Secrets injected at deployment; not held at runtime No runtime vault dependency Secrets in CI logs risk; no dynamic rotation; not suitable for long-running services Short-lived batch jobs only

Architectural Tensions

Tension Trade-Off
Rotation Frequency vs Stability More frequent rotation reduces exposure window but increases rotation failure risk and operational complexity. Resolution: rotate every 90 days for static keys; dynamic credentials where possible.
Dynamic vs Static Credentials Dynamic credentials are safer but require provider API support. Resolution: use dynamic where available (AWS Bedrock, Azure OpenAI via MI); manage rotation for static (OpenAI, Anthropic).

13. Failure Modes

Failure Likelihood Impact Detection Recovery
Key leaked via Git history High (industry-wide) Critical Secret scanning on push; TruffleHog in CI Immediately revoke; force-push removal from history; rotate all secrets in affected repo
VITE_* secret in production bundle High (common mistake) Critical CI gate on build; runtime CSP violation detector Emergency redeployment; revoke exposed key; new key with correct prefix-free naming
Vault HA failure Low Critical Vault health metrics Vault HA cluster (3-node Raft); multi-AZ
Rotation failure (provider API down) Medium Medium Rotation job failure alert Retry with backoff; extend key validity if possible; manual rotation runbook
Secret path misconfiguration (wrong app reads wrong key) Low High Vault audit log anomaly Vault policy fix; immediate key rotation

14. Regulatory Considerations

Regulation Requirement Implementation
APRA CPS234 §22 Manage access to systems according to information sensitivity Vault policies + workload identity implement CPS234 §22 access management
ISO 27001 A.9.4 (Access to Systems and Applications) Prevent unauthorised access to systems Vault access control + MFA for human access
SOC 2 CC6.1 Logical access controls Vault policies + audit log provide evidence for CC6.1
NIST CSF PR.AC-1 Identities and credentials are managed for authorised devices and users Vault secret lifecycle management implements PR.AC-1
GDPR Art. 32 Appropriate technical security measures API key management is a technical security measure protecting AI systems that may process personal data

15. Reference Implementations

AWS

Component AWS Service
Secret storage AWS Secrets Manager (auto-rotation for supported services)
Dynamic credentials AWS IAM + STS for Bedrock; no static key needed
Application access IAM Roles for Service Accounts (IRSA) for EKS
CI gate GitHub Actions + detect-secrets; CodeBuild phase
Audit CloudTrail + CloudWatch Logs
Rotation Secrets Manager Lambda rotation function

Azure

Component Azure Service
Secret storage Azure Key Vault
Dynamic credentials Azure Managed Identity for Azure OpenAI (no static key)
Application access AKS Workload Identity + Key Vault CSI driver
Audit Azure Monitor + Azure AD Audit Logs
Rotation Key Vault rotation policies

On-Premises

Component Technology
Secret storage HashiCorp Vault (Raft HA)
Dynamic credentials Vault AWS Secrets Engine (for cloud providers)
Application access Kubernetes Auth Method (OIDC JWT)
CI gate TruffleHog in Jenkins/GitLab CI pre-commit hooks
Audit Vault Audit Backend → Kafka → Elasticsearch
Rotation Vault Agent + custom rotation scripts

Pattern ID Relationship
AI Gateway EAAPL-SEC001 Gateway is the primary runtime holder of model provider credentials
Model Isolation EAAPL-SEC003 Model isolation's secret sidecar depends on SEC008 vault infrastructure
Secure Tool Invocation EAAPL-SEC004 Per-tool JIT credentials are issued from the vault infrastructure in SEC008
Zero-Trust AI Pipeline EAAPL-SEC007 JIT access pillar of SEC007 is underpinned by SEC008 vault
AI Data Classification EAAPL-SEC009 Secret classification levels stored in vault metadata

17. Maturity Assessment

Overall Maturity: Mature

Dimension Score (1–5) Rationale
Pattern definition clarity 5 Well-understood problem with clear, proven solutions
Technology availability 5 Vault, Secrets Manager, Key Vault are all production-ready, battle-tested
Industry adoption 4 Vault and cloud-native secrets managers widely adopted; AI-specific guidance less common
CI secret scanning 5 TruffleHog, GitLeaks, detect-secrets are mature tools
Regulatory alignment 5 Directly maps to APRA CPS234, ISO 27001, SOC 2
Developer experience 3 Vault integration requires application-level code changes; friction point for adoption

18. Revision History

Version Date Author Changes
1.0 2024-01-10 Security Architecture Team Initial pattern definition
1.1 2024-04-25 Security Architecture Team Added frontend/VITE_ critical guidance; expanded CI gate detail
2.0 2025-02-15 Security Architecture Team Major revision: added dynamic credential architecture; APRA mapping; key compromise runbook; updated to reflect production incidents
← Back to LibraryMore AI Security