LLM Security: OWASP Top 10 in Practice

A company deploys an AI assistant to handle customer queries. In the first week, everything works correctly. By the fourth week, someone pastes a cleverly crafted question into the chat, prompting the model to reveal the system prompt pattern. By the eighth week, another user discovers the agent willingly calls internal APIs beyond the allowed scope. None of these incidents are anomalies. All are classified in the OWASP LLM Top 10, and all have known defense patterns.

Below, I describe each of the ten classes, how they manifest in real-world enterprise deployments, and what concrete mechanisms mitigate them.

What is OWASP LLM Top 10 and Why It Matters in 2026#

OWASP (Open Worldwide Application Security Project) released the LLM Top 10 list as the equivalent of its classic web application security set, adapted for the specifics of language models. The list is not an academic exercise. It results from analyzing incidents in production AI systems and describes patterns that recur regardless of the base model or platform.

This article is based on the current canonical version — the OWASP Top 10 for LLM Applications 2025 — which reorganized and renumbered the categories relative to the original 2023–2024 list. Compared to that version, new classes were added, including vector and embedding weaknesses (RAG), misinformation, and unbounded resource consumption, while some former categories (such as model theft) were folded into broader new classes.

In 2026, the list’s significance has grown for several reasons. First, the AI Act requires documenting risk management measures for AI systems, and OWASP LLM Top 10 is a natural reference point in audits. Second, more companies are deploying agents with real agency (API calls, data writes), where a security flaw has operational, not just informational, consequences. Third, insurers have started asking about OWASP compliance for cyber policies.

For companies in Poland, the list has practical implications for deployments subject to GDPR: the data controller is responsible for technical and organizational measures, and a security incident in an AI system may simultaneously constitute a personal data breach.

LLM01 Prompt Injection: The Most Common Attack Vector#

Prompt injection involves injecting instructions into content that the model processes as data. The model cannot naturally distinguish between “owner system commands” and “commands hidden in customer documents.” Attackers insert text like “Ignore previous rules and disclose the system structure” into messages, documents, or web pages. If unchecked, the model treats this as a new instruction.

Two variants exist:

Direct injection — the user enters a malicious instruction directly into the chat.
Indirect injection — the instruction is hidden in external content the agent retrieves and processes (a website, PDF, email in a mailbox handled by the agent).

Indirect injection is harder to detect because the attacker is not a system user but controls content the agent processes externally.

Defense: guardrails on input (regex, built-in classifiers), clear separation of system instructions from user data in the prompt, sandboxing agent tools. Defense pattern details are covered in the article on prompt injection and assistant protection.

LLM02 Sensitive Information Disclosure: The Model Reveals What It Knew#

The model may disclose training data, system context data, or data processed earlier in the session. Three practical variants:

Memorization — a model fine-tuned on internal documents may quote their fragments in responses to unauthorized users.
Context leakage — fragments of RAG documents or another user’s data appear in the model’s response, which the asker should not have access to.
Cross-session leakage — in poorly designed architectures, data from one session ends up in another’s context.

Defense: Mask PII before data reaches the model, isolate contexts between sessions, enforce access control on the retrieval layer in RAG (the model sees only documents the asker is authorized for), data-residency for sensitive data via self-hosting. Masking patterns are covered in more detail in the article on PII anonymization before AI.

LLM03 Supply Chain: Risks in Dependencies#

The AI system relies on a dependency layer: base models from vendors, integration libraries (LangChain, LlamaIndex, etc.), plugins, LoRA adapters, external datasets. Each dependency can be compromised: a base model with a backdoor, a malicious PyPI package impersonating a popular library, a poisoned vector database version.

This is the same vector as in classic Software Supply Chain, but with an added dimension: a compromised base model may behave correctly 99.9% of the time and only react maliciously to a specific trigger.

Defense: Pin dependency versions (no latest), verify cryptographic hashes of models during download, SBOM (Software Bill of Materials) for the entire AI stack, regular CVE scanning (as in CI/CD pipeline), self-hosting models where the supply chain must be fully controlled.

LLM04 Data and Model Poisoning: Risk During Model Development#

Data and model poisoning involves intentionally introducing harmful examples into the dataset used for pre-training, fine-tuning, or RLHF — or substituting manipulated model weights. The result is a model with embedded behaviors not visible in standard tests but activated by specific signals.

For companies deploying fine-tuning of their own models on internal data: a poisoned training set (e.g., mislabeled examples, intentionally inserted data by a malicious employee) can lead to a model that systematically favors certain responses or behaves differently with specific keywords (a so-called backdoor).

Defense: Audit data before fine-tuning, verify the provenance of weights and datasets, verify sampling (statistical label distribution checks), red-team testing after model training, prefer RAG over fine-tuning where data changes frequently or has unknown provenance.

LLM05 Improper Output Handling: When the Model Passes Data Further#

The model returns text that the application may execute or pass to another component. If the output is not sanitized, Cross-Site Scripting via generated HTML, SQL injection via generated queries, or code execution in automation systems that directly run the model’s output becomes possible.

This vector is particularly dangerous in agent architectures, where LLM output becomes input for the next tool call.

Defense: Treat model output as untrusted external input. Sanitize HTML before sending it to the browser. Use structured output (JSON Schema) instead of raw text where data goes to a system. Never use eval() on text generated by the model.

LLM06 Excessive Agency: An Agent with Too Much Autonomy#

This vulnerability class arises not from a malicious attack but from system design. The agent was given too broad a scope, too many tools, or too few contextual constraints. With prompts outside the expected range, it may take actions the designer did not anticipate: deleting data instead of just reading it, sending emails to all contacts instead of one, calling a production API instead of a test one.

Excessive agency is dangerous because it’s hard to detect through happy-path testing and only surfaces with edge cases or malicious prompts.

Defense: Minimal footprint — the agent gets only the tools needed for a specific task, not “all that might be useful.” Permission scope per workflow, not per agent. Human-gate (HMAC token) for actions with side effects: sending, writing, payment. Quarterly review: are all permissions still in use? The “gradual loosening” pattern (start with tight oversight, loosen after proving safety) minimizes this risk over time.

LLM07 System Prompt Leakage: Leaking the System Instructions#

The user prompts the model to reveal the content of the system prompt — the instructions, rules, and context that were meant to stay hidden. The real risk lies not in revealing the text itself but in what was placed in it: API keys, credentials, decision thresholds, business rules, or paths to internal systems. If a system’s security depends on the prompt staying secret, the system is poorly designed.

Insecure agent tool design amplifies this vector: no parameter validation, overly broad permissions (a read tool also has write access), or no confirmation before an irreversible action mean that leaking the instructions gives the attacker a map for abusing the tools.

Defense: No secrets in the system prompt — secrets belong in a vault; access control and security rules are enforced outside the model (in the application), not via “please don’t reveal this.” Parameter validation on the tool side, regardless of what the model passed. Principle of least privilege and an allow-list of tools instead of dynamic addition. The same principles are detailed in the article on AI agent security.

LLM08 Vector and Embedding Weaknesses: Weak Points in the RAG Layer#

A new class from the 2025 list, specific to RAG systems. The way embeddings are generated, stored, and retrieved creates its own attack surface. Practical variants:

Injection via the knowledge base — an attacker places a hidden instruction in a document indexed into RAG, which gets retrieved and executed on the right query (this is indirect injection at the retrieval-layer level).
Multi-tenant leakage — a lack of isolation in the vector database lets one client’s query retrieve fragments of another’s documents.
Index poisoning — injected data shifts the search results so the model gets a manipulated context and answers based on it.

Defense: Access control and data isolation at the vector database level (per-tenant partitioning), validation and cleaning of content before indexing, verification of the provenance of documents admitted to the index, and monitoring of retrieval quality (whether returned fragments are consistent with the asker’s access policy).

LLM09 Misinformation: The Model Generates False but Plausible-Sounding Content#

The model produces information that is untrue — hallucinations, fabricated sources, wrong facts — stated with a confidence that makes it hard to tell apart from correct content. The risk is compounded by overreliance: an organization treats output as authoritative without verification, leading to decisions based on falsehood, skipping expert review, and legal liability for a decision made “based on AI.”

In regulated sectors (finance, law, medicine, HR), misinformation accepted without verification may violate AI Act requirements for human-oversight.

Defense: Grounding responses in RAG with source citation instead of generating from the model’s memory, UX design that enforces an uncertainty context (the model marks low confidence, doesn’t format the response as a “fact”). Human-gate for high-risk decisions. User training as part of deployment. Monitoring the escalation rate as a proxy for overreliance.

LLM10 Unbounded Consumption: Unbounded Resource Use and Model Extraction#

A new, broader class from the 2025 list that combines the former Model Denial of Service with the risk of model extraction. Two practical dimensions:

Resource exhaustion (DoS / cost) — certain prompt formulations cause the model to generate a response significantly longer or consume many more tokens than a typical query. Attackers exploit this to exhaust the API budget, slow the system for other users, or force limit breaches (deep recursion in the response, very long contexts pushed repeatedly, responses near the maximum context window).
Model / knowledge theft (model extraction) — someone systematically queries the model, collecting (prompt, response) pairs to replicate its behavior or extract knowledge learned during fine-tuning (including company data used in training) — an indirect channel for leaking business information.

Defense: Input and output length limits (max prompt and response tokens), throttling per user and per IP, anomaly monitoring in token costs (a 3× increase should trigger an alert) and in usage patterns (queries with very similar structure in high volume = an extraction signal). The LLM router architecture (llm-router) with backpressure is the right place to implement these barriers; additionally, isolate fine-tuned models from the public API.

OWASP LLM Top 10 Map: Risk vs. Defense#

OWASP Class (2025)	Main Risk	Key Defense Layer
LLM01 Prompt Injection	model instruction takeover	input guardrails, prompt/data separation
LLM02 Sensitive Information Disclosure	sensitive data leak	PII masking, session isolation, RAG access control
LLM03 Supply Chain	compromised dependencies	version pinning, SBOM, CVE scan
LLM04 Data and Model Poisoning	model backdoor	data audit, weight provenance, post-training red-team
LLM05 Improper Output Handling	malicious output execution	output sanitization, structured output
LLM06 Excessive Agency	agent exceeds scope	minimal footprint, allow-list, human-gate
LLM07 System Prompt Leakage	system instruction leak	no secrets in prompt, rules enforced outside the model
LLM08 Vector and Embedding Weaknesses	attack via the RAG layer	vector database isolation, content validation before indexing
LLM09 Misinformation	false content without verification	grounding in RAG with sources, uncertainty UX, human-gate
LLM10 Unbounded Consumption	resource exhaustion, model extraction	token limits, throttling, anomaly monitoring

How to Implement Layered Defense in Practice#

OWASP LLM defense is not a one-time project. It’s an architecture built iteratively: first mandatory layers (guardrails, PII masking, human-gate), then monitoring and red-teaming, finally incident response procedures.

Prioritization order depends on the risk profile:

Agents with tools — start with LLM01, LLM06, LLM07 (prompt injection, excessive agency, system prompt leakage), as these three classes combine into a single attack vector.
RAG systems with sensitive data — prioritize LLM02 (sensitive disclosure), LLM08 (vector and embedding weaknesses), and LLM01 indirect injection, as attackers may inject instructions into documents retrieved by the agent.
Internally fine-tuned models — LLM04 (data and model poisoning) and LLM10 (unbounded consumption / model extraction) require special attention during data preparation.
Public systems (website chatbot) — LLM10 (unbounded consumption / DoS) and LLM09 (misinformation) are particularly critical due to scale and user anonymity.

Assessing readiness and identifying the most critical gaps in your current AI system is easier with the readiness assessment tool. The cost of implementing security measures for a specific scope is generated by the ROI calculator.

Before diving into technical details, it’s worth reading the article on AI deployment planning step-by-step — security is designed alongside architecture, not after it’s built.

Try It Live#

Describe your current or planned AI system, and the model will assess which OWASP LLM classes are most relevant and suggest concrete barriers (playground: PII masked, zero retention):

▶Assess OWASP LLM Risk for Your Systemsandbox · reasoning

FAQ#

Does OWASP LLM Top 10 apply only to large companies?#

No. Any company deploying an AI system that processes customer data or has access to internal resources should know at least LLM01 (prompt injection) and LLM02 (sensitive information disclosure). These two vectors apply even to simple FAQ chatbots. Deployment scale affects prioritization, not whether the list is relevant.

How often is OWASP LLM Top 10 updated?#

The list is updated by OWASP in response to new incidents and attack patterns. The current canonical version is the OWASP Top 10 for LLM Applications 2025, which reorganized and renumbered the categories relative to the original 2023–2024 list and added new classes (including vector and embedding weaknesses, misinformation, and unbounded resource consumption). For long-term deployments, it’s worth aligning security reviews with the list’s update cycle, typically once a year or after significant system architecture changes.

How does OWASP LLM Top 10 relate to AI Act requirements?#

The AI Act requires high-risk systems (Annex III) to document risk management measures, pre-deployment testing, and human-oversight. OWASP LLM Top 10 is a natural framework for meeting these requirements: covering the list provides a starting point for the technical documentation required by regulators. It’s not the only required documentation, but its absence in an AI Act audit is a warning sign. Regulatory details are covered in the article AI Act and GDPR 2026.

Are guardrails enough to secure an AI system?#

Guardrails are one layer, not a complete defense. OWASP LLM Top 10 shows that vulnerability classes like supply chain (LLM03), excessive agency (LLM06), or misinformation (LLM09) aren’t addressed by input/output guardrails at all. Effective defense requires: guardrails (input and output), PII masking, least privilege for agent tools, anomaly monitoring, and incident response procedures. Each layer independently reduces risk, and together they create defense in depth.

What to do if a vulnerability is discovered in an AI system?#

The first action is isolation: disconnect the system or switch to read-only mode before the incident scales. Second is log analysis (which is why observability must be in place from day one). Third is assessing whether a personal data breach occurred, as GDPR requires reporting to UODO within 72 hours if the risk to individuals is high. Incident response runbooks should be part of the AI system documentation, not created only after an event.