Home›Security Reference

Security Reference

OWASP LLM Top 10

Each OWASP LLM vulnerability mapped to the defensive architecture patterns in this library. Use this reference to select countermeasures during threat modelling, architecture review, or regulatory assessment.

Based on OWASP LLM Top 10 v1.1 — updated 2025

10Vulnerabilities

3Critical

5High

2Medium

Severity

Critical

High

Medium

LLM01Critical

Prompt Injection

Attackers craft inputs that override system-level instructions, hijacking model behaviour to exfiltrate data, bypass controls, or execute unintended actions. Both direct (user-supplied) and indirect (retrieved content) injection vectors apply. In 2024, a major Australian retail bank's AI customer service agent was manipulated via prompt injection to reveal internal policy thresholds by an external security researcher — the finding was disclosed to APRA under CPS 234 §24 as a near-miss information security incident.

Defensive Patterns

→Prompt Injection Defence →Agent Sandboxing

Impact

Full loss of model instruction integrity; data exfiltration; unauthorised agentic actions in production pipelines.

LLM02High

Insecure Output Handling

Model-generated content is passed downstream — to browsers, shells, or APIs — without sanitisation. This enables cross-site scripting, remote-code execution, SSRF, or privilege escalation depending on the consuming system. Specific attack vectors include LLM outputs containing <script> tags passed to a browser renderer, path traversal sequences (../../../etc/passwd) inserted into file-handling pipelines, and SQL fragments (''; DROP TABLE users;--) passed directly to database query builders without parameterisation.

Defensive Patterns

→Output Validation →AI Output Governance

Impact

Code injection into downstream systems; XSS in web surfaces; unauthorised system calls from unvalidated command output.

LLM03High

Training Data Poisoning

Adversarial data introduced during training or fine-tuning degrades model integrity, embeds backdoors, or creates systematic biases that persist invisibly across every subsequent inference.

Defensive Patterns

→Training Data Governance

Impact

Silent model degradation; long-lived backdoors that bypass runtime controls; regulatory non-compliance on data provenance.

LLM04High

Model Denial of Service

Resource-exhaustive prompts — extremely long contexts, repetitive generation loops, or computationally expensive reasoning chains — degrade availability and drive up inference costs disproportionately. Specific attack patterns include adversarial long-context attacks (sending 190K token inputs to models with 200K context windows to maximise KV-cache pressure), recursive prompt self-expansion (where model output re-enters the context window and grows unboundedly), and repeated tool-call loops that exhaust per-minute rate limits and trigger cascading retry storms across shared AI infrastructure.

Defensive Patterns

→AI Rate Limiting →AI Telemetry

Impact

API cost spikes; latency SLA breaches; cascading availability failure across shared AI infrastructure.

LLM05High

Supply Chain Vulnerabilities

Risks introduced through third-party model weights, fine-tuning datasets, embedding providers, plugins, or SDKs with unverified provenance — any of which may carry vulnerabilities, backdoors, or malicious behaviour.

Defensive Patterns

→Model Provenance →Model Registry

Impact

Untraceable model risk; regulatory audit failures; hidden adversarial behaviour from unverified model artefacts.

LLM06Critical

Sensitive Information Disclosure

Models memorise, regurgitate, or are manipulated into disclosing PII, credentials, proprietary data, or confidential system context embedded in training corpora, system prompts, or retrieval pipelines. For Australian healthcare deployments specifically: Medicare numbers, My Health Record identifiers, and Tax File Numbers (TFNs) are Category 1 sensitive information under the Privacy Act 1988 — inadvertent AI-driven disclosure of any of these identifiers triggers mandatory notification obligations under the Notifiable Data Breaches (NDB) scheme administered by the Office of the Australian Information Commissioner (OAIC), with notification required within 30 days of becoming aware of an eligible data breach.

Defensive Patterns

→PII Redaction →Data Classification

Impact

Regulatory breach (Privacy Act, GDPR); credential exposure; trade secret leakage through conversational interfaces.

LLM07High

Insecure Plugin Design

Plugins or tools connected to an LLM lack proper authentication, input validation, or least-privilege scoping — allowing model outputs to trigger unintended operations through an insufficiently hardened tool surface.

Defensive Patterns

→Tool Registry →MCP Auth & Authz

Impact

Lateral movement through connected systems; unintended data writes; privilege escalation via poorly scoped tool credentials.

LLM08Critical

Excessive Agency

An LLM-powered agent is granted more permissions, capabilities, or autonomy than the task requires — enabling it to take high-impact, irreversible actions without human oversight or appropriate guardrails. This is the #1 risk for Australian financial services under APRA CPS 230 operational resilience requirements: an autonomous AI agent that can execute fund transfers, modify customer records, or submit regulatory filings without a human-in-the-loop approval gate breaches the "four-eyes principle" embedded in most Australian Financial Services Licence (AFSL) compliance frameworks and the APRA-mandated Board-level accountability model for material operational risk decisions.

Defensive Patterns

→Agent Sandboxing →Approval Gateway →Permission Boundaries

Impact

Irreversible production mutations; data destruction; regulatory non-compliance from unsupervised autonomous actions.

LLM09Medium

Overreliance

Users or automated systems trust LLM outputs without appropriate scepticism or verification — particularly dangerous in regulated decisions where confidently presented hallucinations carry the same surface authority as accurate responses.

Defensive Patterns

→Confidence Routing →AI Telemetry

Impact

Flawed decisions in regulated workflows; liability exposure; erosion of human oversight in critical business processes.

LLM10Medium

Model Theft

Adversaries extract proprietary model weights, fine-tuned parameters, or system prompts through repeated querying, model inversion attacks, or direct access to insufficiently protected model artefacts.

Defensive Patterns

→Model Access Control →AI Gateway

Impact

IP theft of fine-tuned models; competitive exposure; circumvention of safety measures in extracted model copies.

Comparative Reference

OWASP LLM Top 10 vs Traditional OWASP Top 10

AI-specific attacks share surface similarities with classic web vulnerabilities but require fundamentally different mitigations. Mapping them naively into a traditional AppSec programme will leave critical gaps.

AI-Specific Attack (OWASP LLM)	Most Similar Classic Vulnerability	Key Difference — why they are NOT the same
LLM01 — Prompt Injection	A03 Injection (SQLi / CMDi)	SQLi targets a deterministic parser with known grammar; prompt injection targets a statistical language model with no fixed grammar. Input sanitisation rules that reliably block SQLi payloads provide zero protection against natural-language instruction overrides. The attack surface is every token the model processes, including retrieved documents and tool outputs.
LLM02 — Insecure Output Handling	A03 Injection / A07 XSS	Classic XSS flows from developer-controlled templates with predictable injection points. LLM output injection is non-deterministic — the same prompt may or may not produce a malicious payload depending on model state, temperature, and context. Output schema validation and allowlist rendering are required in addition to HTML encoding.
LLM06 — Sensitive Information Disclosure	A02 Cryptographic Failures / A01 Broken Access Control	Classic disclosure results from missing encryption or access gates. LLM disclosure happens even when access controls are correct — the model may regurgitate training data or RAG-retrieved content it was legitimately given access to, exposing it to users who should not see it. Role-based output filtering on the model response is a new control class with no traditional equivalent.
LLM08 — Excessive Agency	No direct equivalent	Traditional web applications execute only the code a developer explicitly wrote. An agentic LLM can decide at runtime to invoke tools, modify data, or chain actions based on conversational context — the "attack surface" is the model's reasoning process itself. The four-eyes principle, approval gateways, and permission boundaries are architecture-level controls that have no equivalent in the OWASP Top 10.
LLM04 — Model Denial of Service	A05 Security Misconfiguration / classic DoS	Classic DoS floods network bandwidth or exhausts connection pools with simple repetition. LLM DoS uses semantically meaningful inputs (long contexts, recursive self-reference, adversarial reasoning chains) that pass rate-limit filters and maximise per-token compute cost — making traditional volume-based detection ineffective.
LLM10 — Model Theft	A08 Software and Data Integrity Failures / IP theft	Source code theft requires file-system access. Model extraction can be accomplished entirely through the inference API using repeated queries and model inversion techniques — no breach of the storage layer is required. Fine-tuned model weights and system prompts representing significant IP investment can be reconstructed from black-box API access alone.

OWASP LLM Top 10 v1.1 · Traditional OWASP Top 10 2021 · EAAPL comparative analysis

Pattern Library

Every defensive pattern — with architecture diagrams, regulatory mapping, and implementation guidance.

Explore the full pattern library→