Validating LLM outputs: structured output, schemas, and gua…

Validating LLM outputs: structured output, schemas, and guardrails

A language model always returns text that appears correct. This is deceptive: the response sounds confident even when a field is fabricated, a number is out of range, or a JSON key is named differently than yesterday. If this output goes straight to a database, an invoice, or another system, a single distorted record can halt the entire process. Below, we describe the validation pattern we use to treat LLM output as data, not hope.

Why "looks good" isn’t enough

Raw text from a model has three classes of flaws that must be addressed separately, as each requires a different defense. The first is shape flaws: missing commas, truncated JSON, extra comments before brackets, a field as a string instead of a number. The second is content flaws: a structurally valid object where the value is a hallucination — a fabricated contract number, a date outside the range, a category that doesn’t exist in the dictionary. The third is unsafe outputs: attempted instruction injection, data leakage, or content that must not be shown to the user.

Validation must address all three. Checking JSON alone won’t catch hallucinations; business rule checks won’t work until the text parses at all. That’s why we build it in layers.

Enforcing shape: schema as contract

The starting point is JSON Schema — a formal description of what the output should look like: which fields, what types, which are required, and allowed values (enum). The schema serves two roles at once: it’s an instruction for the model and a contract for the validator. There are two ways to enforce it.

Approach	How it works	Strengths	Limitations
Native structured output	Provider guarantees JSON compliant with the schema (constrained decoding)	No shape errors on the decoding side	Not every model/host supports it; can be slower for complex schemas
Prompt-based structured output	Schema in prompt + validation on the application side	Works with any model, full control	Model may be verbose or deviate from the schema — validation mandatory

In our deployments, we default to the prompt-based with validation variant: schema in the prompt, followed by strict validation using a JSON Schema library. The reason is practical — native enforcement with rigid response_format on some hosts can be slow (seconds, sometimes timeouts) and isn’t available for every model. We want to choose the best model for the task, not the best one for the API mode. How to select a model for a specific task is covered in our post on choosing an LLM model.

Regardless of the approach, the principle is the same: the schema validates structure, not meaning. Loose length constraints (the model writes richly), strict types, and enums where the value goes further into the system.

Validation + repair loop

A single call doesn’t always hit the schema — and that’s normal, not a failure. That’s why, after validation, if it fails, we perform one controlled repair attempt: we return the model’s own output along with a specific validator error message (“field amount must be a number”, “missing required category”) and ask for a correction. This usually suffices because the model now has precise information about what went wrong.

The loop must have a hard limit. In our practice, one, at most two repair attempts work — more rarely changes anything and multiplies cost and latency. After exhausting attempts, we don’t guess: the output goes to error handling (queue, human in the loop, default value), never to the target system. Schematically:

Generate → output according to schema in prompt.
Validate → parse JSON + check JSON Schema + business rules.
Repair → on error, return validator message to the model; repeat validation (attempt limit).
Decide → success: pass through; failure after limit: fail-closed.

▶Validate output against a ticket schemasandbox · reasoning

Guardrails: content and safety

The schema enforces shape but won’t tell you if an order number is fabricated or if the model is attempting an injected instruction. That’s the job of guardrails — a layer of rules operating after structural validation, often before the model (on input). Three checks we use most frequently:

Value grounding — values that must exist in reality (contract number, category, customer ID) are compared against the source of truth. A field that doesn’t match any record is treated as a hallucination and rejected, even if the type is correct.
Range and rule control — amounts, dates, numbers must fit within business bounds. A price outside the range or a past date for a future deadline is an error signal, not data.
Output safety — filter against data leakage, injection attempts, and content that must not be shown. This is the same line of thinking as in AI agent security — model output is untrusted data until validated.

We also apply guardrails where the model classifies — the output of a classifier must belong to a closed set of labels, otherwise, ticket routing sends the case to an unknown destination. When output powers a company GPT based on knowledge or a RAG system, the same discipline protects against providing the user with confident-sounding but false answers.

When to fail-closed vs. fail-open

The most critical design decision: what to do when validation still fails after repair. Fail-closed means refusal — the system doesn’t pass uncertain output, instead escalating to a human, returning an error, or using a safe default value. This is the default mode wherever the consequence is irreversible or costly: payments, database changes, content sent to customers, decisions with legal implications.

Fail-open (pass through despite uncertainty) is only allowed where errors are cheap and reversible, and the absence of a response is worse than an imperfect one — for example, a tag suggestion that a human verifies anyway. Rule of thumb: if you can’t cheaply undo the effect of erroneous output, design for fail-closed. How to transition from such a pattern from pilot to stable production is covered in from AI pilot to production.

FAQ

Does structured output handle validation on its own?

No. Native structured output ensures the output is valid JSON compliant with the schema — that’s one of three flaw classes (shape). It doesn’t check if values are true or safe, so content validation and guardrails are still needed. We treat it as a foundation, not a complete solution.

Prompt-based or native structured output?

Depends on the host and model. Native provides shape guarantees but isn’t supported by every model and can be slower for complex schemas. In practice, we often choose prompt-based with strict application-side validation because it works with any model and leaves control over the repair loop.

How many times should you attempt to repair output?

In our practice, one, at most two repair attempts. After providing the model with a specific error message, the first correction usually suffices; subsequent attempts rarely help and multiply cost and latency. After exhausting the limit, the output goes to error handling, not the system.

How does validation protect against hallucinations?

The schema alone doesn’t — a hallucination can be structurally correct. Guardrails protect: comparing values against the source of truth (grounding), range control, and closed dictionaries. If a field doesn’t match any real record, we reject it regardless of how confident it sounds.

Does validation significantly slow down the system?

Structural validation and rules are cheap — milliseconds. The real cost is a potential repair round, meaning one additional model call, and only if the first output fails. In return, you gain predictability, which is invaluable when output feeds other systems.