Structured Output Validation
Key Points
- The model returns JSON. Production code consumes that JSON. If the JSON doesn't match your schema, the chain breaks. Validation is non-optional.
- Three strategies, ranked by reliability: (1) native structured output (provider enforces grammar) → (2) post-hoc schema validation + retry-with-feedback → (3) free-form parsing with
try/catch(avoid). - OpenAI strict mode (
response_format: { type: "json_schema", strict: true }, GA April 2025) constrains the decoder itself — the model is mathematically incapable of emitting a token that violates the schema. Anthropic does similar via tool-call schemas. Gemini offersresponse_schema. - In .NET, schema is generated from your record +
[Description]attributes byAIJsonUtilities/AIFunctionFactory. You don't usually write JSON Schema by hand. - Validation libraries:
JsonSchema.Net(spec-correct),NJsonSchema(schema generation + validation),FluentValidation(semantic rules —Email, ranges, business constraints). - Retry budget: 2–3 attempts max. Past that, fail loudly. Infinite retry burns tokens and hides bugs.
- On retry, send only the bad output + the validation errors + the schema — NOT the full chat history. That's a 10× token saving.
- Anti-patterns: silently dropping invalid responses, retrying forever, asking the model to "be careful next time" without telling it what failed.
Concepts (deep dive)
The problem in one sentence
You asked the LLM for a Customer { Id: int, Email: string, Plan: "free"|"pro"|"enterprise" }. It returned { "id": "abc", "email": null, "plan": "PRO" }. Three errors in 12 tokens. Your downstream JsonSerializer.Deserialize<Customer> either throws or silently produces garbage. Production agents call tools, parse outputs, route control flow on parsed values — every malformed response is a chain break.
Three layers of defense
┌─────────────────────────────────────────────────────┐
│ Layer 1: Constrained decoding (provider-enforced) │ ← strongest
│ OpenAI strict mode, Anthropic tools, Gemini schema│
├─────────────────────────────────────────────────────┤
│ Layer 2: Schema validation (post-hoc) │ ← required regardless
│ JsonSchema.Net, NJsonSchema │
├─────────────────────────────────────────────────────┤
│ Layer 3: Semantic validation │ ← business rules
│ FluentValidation, DataAnnotations │
└─────────────────────────────────────────────────────┘
Layer 1 prevents structural errors. Layer 2 catches anything Layer 1 missed (older models, non-strict providers, drifted schemas). Layer 3 catches things schema cannot express ("the endDate must be after startDate", "the email must be unique in our DB").
Layer 1 — native structured output
OpenAI strict JSON Schema
var options = new ChatOptions
{
ResponseFormat = ChatResponseFormat.ForJsonSchema(
schema: AIJsonUtilities.CreateJsonSchema(typeof(Customer)),
schemaName: "customer",
strict: true)
};
var resp = await chat.GetResponseAsync<Customer>("Extract the customer from: ...", options);
Customer c = resp.Result; // guaranteed schema-correct
Under the hood OpenAI compiles your schema to a context-free grammar. At each decoding step the sampler masks any token that would violate the grammar — the probability of those tokens is set to zero before sampling. The model cannot produce invalid JSON.
Anthropic — tool-call schemas
Claude doesn't have a generic "structured output" mode. Idiom: define a fake tool whose only purpose is to receive your structured object, then force the model to call it.
[Description("Returns the extracted customer record")]
Customer ReturnCustomer(Customer c) => c;
var options = new ChatOptions
{
Tools = [AIFunctionFactory.Create(ReturnCustomer)],
ToolMode = ChatToolMode.RequireSpecific("ReturnCustomer")
};
The model has to call the tool, so it has to satisfy the schema.
Gemini — response_schema
Same surface in Microsoft.Extensions.AI; provider adapter translates. Gemini's enforcement is weaker than OpenAI strict — treat as advisory and validate.
Layer 2 — post-hoc validation
Even with strict mode you validate. Reasons: (a) the schema generator doesn't always express every rule, (b) you want a single audit log of every validation failure, © you may be calling a non-strict provider, (d) you want to enforce rules across versions.
using Json.Schema; // JsonSchema.Net
var schema = JsonSchema.FromText(schemaJson);
var results = schema.Evaluate(JsonNode.Parse(rawJson),
new EvaluationOptions { OutputFormat = OutputFormat.List });
if (!results.IsValid)
{
var errors = results.Details
.Where(d => d.Errors is not null)
.SelectMany(d => d.Errors!.Select(e => $"{d.InstanceLocation}: {e.Value}"));
// hand to retry chain
}
Layer 3 — semantic validation
Schema can say "string, max length 256". It can't say "must be a real customer id in our DB", "endDate after startDate", "amount within tenant's daily limit". Use FluentValidation:
public class CustomerValidator : AbstractValidator<Customer>
{
public CustomerValidator()
{
RuleFor(c => c.Email).EmailAddress();
RuleFor(c => c.Plan).Must(p => new[] { "free","pro","enterprise" }.Contains(p));
RuleFor(c => c).Custom((c, ctx) =>
{
if (c.EndDate < c.StartDate)
ctx.AddFailure("endDate", "must be after startDate");
});
}
}
Retry-with-feedback chain
[1] Send prompt + schema
│
▼
[2] Receive JSON
│
▼
[3] Validate (schema + semantic)
│
├── valid ──► return
│
└── invalid
│
▼
[4] retries < N?
│
├── no ──► throw / log / fail loudly
│
└── yes
│
▼
[5] Send: "Your previous output was {bad}. Validation errors: {errors}. Schema: {schema}. Please return corrected JSON only."
│
▼
[2]
Critical: in step 5 send only the bad output + errors + schema, not the original conversation. The model needs the delta, not the history.
Schema generation in .NET
AIJsonUtilities.CreateJsonSchema(typeof(T)) walks the type and produces a JSON Schema. [Description], [Required], [Range], JsonPropertyName all participate.
public record Customer(
[property: Description("Customer id, e.g. C-1234")] string Id,
[property: Description("Email address")][property: EmailAddress] string Email,
[property: Description("Plan tier")] PlanTier Plan,
[property: Range(0, 100_000)] decimal MonthlyRevenue);
public enum PlanTier { Free, Pro, Enterprise }
→ generates schema with enum for PlanTier, format: email for Email, numeric bounds for MonthlyRevenue.
Grammar-constrained sampling — what it actually is
Standard LLM decoding: at each step the model produces a probability distribution over the vocabulary; sample one token. Grammar-constrained decoding inserts a mask between distribution and sample: any token that would, if appended, make the prefix fail to parse against the grammar gets its probability zeroed. The model literally cannot say {"id": "abc"} if id is int — there is no path through the grammar. Implementations: llama.cpp GBNF grammars, Outlines (Python), OpenAI strict mode internally. Trade-off: sampling is slightly slower; some emergent behaviors disappear (the model can't ramble before answering).
How it works under the hood
Schema generation pipeline
[CLR Type]
│ reflection
▼
[JsonTypeInfo + attributes]
│ AIJsonUtilities
▼
[JsonSchema (JsonNode tree)]
│ serialize
▼
[JSON Schema document sent to provider]
OpenAI strict mode pipeline
[your schema] ──► [OpenAI compiles to CFG] ──► [token sampler masks invalid tokens]
│
▼
[output is grammatically correct]
Validation middleware in the IChatClient pipeline
ChatClient.GetResponseAsync(...)
│
▼
[UseFunctionInvocation]
│
▼
[UseStructuredOutputValidation] ← inserts validate + retry
│
▼
[OpenAI / Anthropic / Gemini client]
The middleware sits outside the provider call so it sees raw responses and can re-issue the call.
Code: correct vs wrong
✅ Correct — strict mode + validate + bounded retry
public sealed class StructuredOutputClient(IChatClient inner, IValidator<Customer> validator) : DelegatingChatClient(inner)
{
public override async Task<ChatResponse> GetResponseAsync(
IEnumerable<ChatMessage> messages, ChatOptions? options = null, CancellationToken ct = default)
{
const int MaxAttempts = 3;
var schema = AIJsonUtilities.CreateJsonSchema(typeof(Customer));
options ??= new ChatOptions();
options.ResponseFormat = ChatResponseFormat.ForJsonSchema(schema, "customer", strict: true);
var convo = messages.ToList();
for (int attempt = 1; attempt <= MaxAttempts; attempt++)
{
var resp = await base.GetResponseAsync(convo, options, ct);
var raw = resp.Text ?? "";
try
{
var c = JsonSerializer.Deserialize<Customer>(raw)!;
var v = await validator.ValidateAsync(c, ct);
if (v.IsValid) return resp;
_log.LogWarning("Validation failed (attempt {N}): {Errors}", attempt, v.Errors);
// Send ONLY the delta, not full history
convo = new List<ChatMessage>
{
new(ChatRole.System, "You are correcting prior invalid JSON output."),
new(ChatRole.User, $"Previous output:\n{raw}\n\nErrors:\n{string.Join('\n', v.Errors)}\n\nReturn corrected JSON only.")
};
}
catch (JsonException ex)
{
_log.LogWarning(ex, "JSON parse failed (attempt {N})", attempt);
convo = new List<ChatMessage>
{
new(ChatRole.User, $"Previous output was not valid JSON: {raw}\nReturn JSON only.")
};
}
}
throw new InvalidOperationException($"Structured output failed after {MaxAttempts} attempts");
}
}
❌ Wrong — silent swallow
try { var c = JsonSerializer.Deserialize<Customer>(resp.Text); return c; }
catch { return null; } // production now has nulls floating around with no log
❌ Wrong — unbounded retry
while (true) // tokens burn forever; one stuck prompt blows your budget
{
var resp = await chat.GetResponseAsync(...);
if (Validate(resp)) return resp;
}
❌ Wrong — full history on retry
// every retry doubles the conversation; cost explodes
convo.Add(new ChatMessage(ChatRole.User, "Try again, your output was wrong"));
The model has no idea what was wrong. Tell it.
✅ Correct — registered as middleware
chat = chat.AsBuilder()
.UseFunctionInvocation()
.Use(inner => new StructuredOutputClient(inner, validator))
.UseLogging(loggerFactory)
.Build();
Design patterns for this topic
Pattern 1 — "Strict mode first, validate second"
- Intent: belt + suspenders. Constrain at the decoder; validate after.
- Why: strict mode prevents structural errors; validation catches semantic ones.
Pattern 2 — "Bounded retry with delta-only context"
- Intent: give the model a chance to fix its mistake without burning tokens.
- Cap: 2–3 attempts. Past that, escalate.
Pattern 3 — "Tool-as-schema" (Anthropic)
- Intent: force structured output on providers without a strict mode.
- Mechanism: define a tool that receives the structured object; require call.
Pattern 4 — "Schema as DTO"
- Intent: one C# record is the source of truth — schema, validation, deserialization target.
- Mechanism: annotate with
[Description],[Range],EmailAddress; generate schema; deserialize.
Pattern 5 — "Validation telemetry"
- Intent: every validation failure becomes a metric/log so you can spot prompt drift.
- Mechanism: counter
ai.structured_output.validation_failurestagged bymodel,schema_name,error_kind.
Pattern 6 — "Layered validation"
- Intent: schema first (cheap), semantic second (DB/external).
- Why: fail fast on structural errors before doing expensive lookups.
Pros & cons / trade-offs
| Strategy | Pros | Cons |
|---|---|---|
| Strict mode | Mathematically correct | OpenAI-only at full strength; some schema features unsupported |
| Tool-as-schema | Works on Claude | One extra round trip's worth of tokens |
| Post-hoc validation | Provider-agnostic | Costs a retry on failure |
| FluentValidation | Expressive business rules | Not in schema; model can't see rules unless restated in prompt |
| Grammar-constrained sampling | Strongest guarantee | Slightly slower; can degrade reasoning |
| Free-form + try/catch | Trivial | Brittle; bad UX on failure |
When to use / when to avoid
- Use strict mode whenever the provider supports it. No reason not to.
- Use validation middleware in every production agent pipeline.
- Use retry-with-feedback for high-value calls where one extra round trip is cheap relative to a downstream failure.
- Avoid retry chains for streaming responses where partial output is already consumed.
- Avoid unconstrained free-form JSON parsing in production. Always.
- Avoid schema features the provider doesn't support in strict mode (
oneOfwith discriminator, recursive refs sometimes); flatten the schema.
Interview Q&A
Q1. What does OpenAI's strict JSON Schema mode actually do? Compiles the schema to a grammar; masks tokens during sampling so the model cannot emit invalid JSON. Structural correctness is mathematically guaranteed; semantic correctness is not.
Q2. Why validate even with strict mode? Schema cannot express every rule (cross-field, business constraints). Strict mode is provider-specific — you may switch. Drift detection — alerts on validation failures show prompt regressions.
Q3. Difference between schema validation and FluentValidation? Schema = structure (types, required, enums, ranges). FluentValidation = semantic rules (cross-field, DB lookups, business policy).
Q4. How do you do structured output on Claude? Define a tool whose parameter is the desired shape; force call via ToolMode.RequireSpecific. Schema is enforced because it's a function call.
Q5. Retry budget for invalid output? 2–3 attempts. Past that, fail loudly. Unbounded retry hides bugs and burns tokens.
Q6. What do you send on retry? Only the bad output + the validation errors + the schema. Not the full conversation. The model needs the delta.
Q7. How is JSON Schema generated from a C# type? AIJsonUtilities.CreateJsonSchema(typeof(T)) reflects the type, walks [Description], [Required], [Range], [EmailAddress], enums; emits JSON Schema.
Q8. What's grammar-constrained decoding? Each token's probability is masked against a grammar before sampling. Tokens that would break parse have probability zero. Used by llama.cpp GBNF and OpenAI strict mode internally.
Q9. Where does validation sit in the IChatClient pipeline? In a DelegatingChatClient — outside the provider call so it can re-invoke on failure.
Q10. Anti-pattern: catching JsonException and returning null. Silently hides failures. Log, increment metric, retry, then escalate.
Q11. How do you observe validation failures? Counter ai.structured_output.validation_failures tagged by model + schema + error type. Alert when rate exceeds threshold; dashboards by model.
Q12. Does strict mode work with additionalProperties? OpenAI strict requires additionalProperties: false. Closed schemas only. Open shapes need fallback to non-strict + validation.
Gotchas / common mistakes
- ⚠️ Strict mode requires
additionalProperties: falseon every object. Forgetting this rejects the schema. - ⚠️ Recursive types (a tree) don't always work in strict mode. Flatten or paginate.
- ⚠️
JsonSerializer.Deserialize<T>is case-sensitive by default. ConfigurePropertyNameCaseInsensitive = trueor align casing in schema. - ⚠️ Sending the full chat history on retry doubles cost each attempt.
- ⚠️ Asking the model to "try again" without telling it what failed → it produces the same garbage.
- ⚠️ Mixing prose + JSON in the same response. Force JSON-only with
response_formator strip with a regex (last resort). - ⚠️ Floating-point in schema as
number— you may receive1(int). UseJsonNumberHandling.AllowReadingFromStringor accept either. - ⚠️ Streaming + structured output don't mix cleanly: you can't validate until the stream completes.
- ⚠️ Provider-specific schema dialect drift. JSON Schema 2020-12 vs 7 — some keywords differ.
- ⚠️ Treating validation as a development concern instead of a production telemetry signal. Log every failure.