Skip to content

Structured Output Validation

Key Points

  • The model returns JSON. Production code consumes that JSON. If the JSON doesn't match your schema, the chain breaks. Validation is non-optional.
  • Three strategies, ranked by reliability: (1) native structured output (provider enforces grammar) → (2) post-hoc schema validation + retry-with-feedback(3) free-form parsing with try/catch (avoid).
  • OpenAI strict mode (response_format: { type: "json_schema", strict: true }, GA April 2025) constrains the decoder itself — the model is mathematically incapable of emitting a token that violates the schema. Anthropic does similar via tool-call schemas. Gemini offers response_schema.
  • In .NET, schema is generated from your record + [Description] attributes by AIJsonUtilities / AIFunctionFactory. You don't usually write JSON Schema by hand.
  • Validation libraries: JsonSchema.Net (spec-correct), NJsonSchema (schema generation + validation), FluentValidation (semantic rules — Email, ranges, business constraints).
  • Retry budget: 2–3 attempts max. Past that, fail loudly. Infinite retry burns tokens and hides bugs.
  • On retry, send only the bad output + the validation errors + the schema — NOT the full chat history. That's a 10× token saving.
  • Anti-patterns: silently dropping invalid responses, retrying forever, asking the model to "be careful next time" without telling it what failed.

Concepts (deep dive)

The problem in one sentence

You asked the LLM for a Customer { Id: int, Email: string, Plan: "free"|"pro"|"enterprise" }. It returned { "id": "abc", "email": null, "plan": "PRO" }. Three errors in 12 tokens. Your downstream JsonSerializer.Deserialize<Customer> either throws or silently produces garbage. Production agents call tools, parse outputs, route control flow on parsed values — every malformed response is a chain break.

Three layers of defense

┌─────────────────────────────────────────────────────┐
│ Layer 1: Constrained decoding (provider-enforced)   │ ← strongest
│   OpenAI strict mode, Anthropic tools, Gemini schema│
├─────────────────────────────────────────────────────┤
│ Layer 2: Schema validation (post-hoc)               │ ← required regardless
│   JsonSchema.Net, NJsonSchema                       │
├─────────────────────────────────────────────────────┤
│ Layer 3: Semantic validation                        │ ← business rules
│   FluentValidation, DataAnnotations                 │
└─────────────────────────────────────────────────────┘

Layer 1 prevents structural errors. Layer 2 catches anything Layer 1 missed (older models, non-strict providers, drifted schemas). Layer 3 catches things schema cannot express ("the endDate must be after startDate", "the email must be unique in our DB").

Layer 1 — native structured output

OpenAI strict JSON Schema

var options = new ChatOptions
{
    ResponseFormat = ChatResponseFormat.ForJsonSchema(
        schema: AIJsonUtilities.CreateJsonSchema(typeof(Customer)),
        schemaName: "customer",
        strict: true)
};

var resp = await chat.GetResponseAsync<Customer>("Extract the customer from: ...", options);
Customer c = resp.Result;   // guaranteed schema-correct

Under the hood OpenAI compiles your schema to a context-free grammar. At each decoding step the sampler masks any token that would violate the grammar — the probability of those tokens is set to zero before sampling. The model cannot produce invalid JSON.

Anthropic — tool-call schemas

Claude doesn't have a generic "structured output" mode. Idiom: define a fake tool whose only purpose is to receive your structured object, then force the model to call it.

[Description("Returns the extracted customer record")]
Customer ReturnCustomer(Customer c) => c;

var options = new ChatOptions
{
    Tools = [AIFunctionFactory.Create(ReturnCustomer)],
    ToolMode = ChatToolMode.RequireSpecific("ReturnCustomer")
};

The model has to call the tool, so it has to satisfy the schema.

Gemini — response_schema

var options = new ChatOptions
{
    ResponseFormat = ChatResponseFormat.ForJsonSchema(...)
};

Same surface in Microsoft.Extensions.AI; provider adapter translates. Gemini's enforcement is weaker than OpenAI strict — treat as advisory and validate.

Layer 2 — post-hoc validation

Even with strict mode you validate. Reasons: (a) the schema generator doesn't always express every rule, (b) you want a single audit log of every validation failure, © you may be calling a non-strict provider, (d) you want to enforce rules across versions.

using Json.Schema;       // JsonSchema.Net

var schema = JsonSchema.FromText(schemaJson);
var results = schema.Evaluate(JsonNode.Parse(rawJson),
    new EvaluationOptions { OutputFormat = OutputFormat.List });

if (!results.IsValid)
{
    var errors = results.Details
        .Where(d => d.Errors is not null)
        .SelectMany(d => d.Errors!.Select(e => $"{d.InstanceLocation}: {e.Value}"));
    // hand to retry chain
}

Layer 3 — semantic validation

Schema can say "string, max length 256". It can't say "must be a real customer id in our DB", "endDate after startDate", "amount within tenant's daily limit". Use FluentValidation:

public class CustomerValidator : AbstractValidator<Customer>
{
    public CustomerValidator()
    {
        RuleFor(c => c.Email).EmailAddress();
        RuleFor(c => c.Plan).Must(p => new[] { "free","pro","enterprise" }.Contains(p));
        RuleFor(c => c).Custom((c, ctx) =>
        {
            if (c.EndDate < c.StartDate)
                ctx.AddFailure("endDate", "must be after startDate");
        });
    }
}

Retry-with-feedback chain

[1] Send prompt + schema
[2] Receive JSON
[3] Validate (schema + semantic)
   ├── valid ──► return
   └── invalid
   [4] retries < N?
        ├── no ──► throw / log / fail loudly
        └── yes
        [5] Send: "Your previous output was {bad}. Validation errors: {errors}. Schema: {schema}. Please return corrected JSON only."
        [2]

Critical: in step 5 send only the bad output + errors + schema, not the original conversation. The model needs the delta, not the history.

Schema generation in .NET

AIJsonUtilities.CreateJsonSchema(typeof(T)) walks the type and produces a JSON Schema. [Description], [Required], [Range], JsonPropertyName all participate.

public record Customer(
    [property: Description("Customer id, e.g. C-1234")] string Id,
    [property: Description("Email address")][property: EmailAddress] string Email,
    [property: Description("Plan tier")] PlanTier Plan,
    [property: Range(0, 100_000)] decimal MonthlyRevenue);

public enum PlanTier { Free, Pro, Enterprise }

→ generates schema with enum for PlanTier, format: email for Email, numeric bounds for MonthlyRevenue.

Grammar-constrained sampling — what it actually is

Standard LLM decoding: at each step the model produces a probability distribution over the vocabulary; sample one token. Grammar-constrained decoding inserts a mask between distribution and sample: any token that would, if appended, make the prefix fail to parse against the grammar gets its probability zeroed. The model literally cannot say {"id": "abc"} if id is int — there is no path through the grammar. Implementations: llama.cpp GBNF grammars, Outlines (Python), OpenAI strict mode internally. Trade-off: sampling is slightly slower; some emergent behaviors disappear (the model can't ramble before answering).


How it works under the hood

Schema generation pipeline

[CLR Type]
    │  reflection
[JsonTypeInfo + attributes]
    │  AIJsonUtilities
[JsonSchema (JsonNode tree)]
    │  serialize
[JSON Schema document sent to provider]

OpenAI strict mode pipeline

[your schema] ──► [OpenAI compiles to CFG] ──► [token sampler masks invalid tokens]
                                              [output is grammatically correct]

Validation middleware in the IChatClient pipeline

ChatClient.GetResponseAsync(...)
[UseFunctionInvocation]
[UseStructuredOutputValidation]   ← inserts validate + retry
[OpenAI / Anthropic / Gemini client]

The middleware sits outside the provider call so it sees raw responses and can re-issue the call.


Code: correct vs wrong

✅ Correct — strict mode + validate + bounded retry

public sealed class StructuredOutputClient(IChatClient inner, IValidator<Customer> validator) : DelegatingChatClient(inner)
{
    public override async Task<ChatResponse> GetResponseAsync(
        IEnumerable<ChatMessage> messages, ChatOptions? options = null, CancellationToken ct = default)
    {
        const int MaxAttempts = 3;
        var schema = AIJsonUtilities.CreateJsonSchema(typeof(Customer));
        options ??= new ChatOptions();
        options.ResponseFormat = ChatResponseFormat.ForJsonSchema(schema, "customer", strict: true);

        var convo = messages.ToList();
        for (int attempt = 1; attempt <= MaxAttempts; attempt++)
        {
            var resp = await base.GetResponseAsync(convo, options, ct);
            var raw = resp.Text ?? "";
            try
            {
                var c = JsonSerializer.Deserialize<Customer>(raw)!;
                var v = await validator.ValidateAsync(c, ct);
                if (v.IsValid) return resp;

                _log.LogWarning("Validation failed (attempt {N}): {Errors}", attempt, v.Errors);

                // Send ONLY the delta, not full history
                convo = new List<ChatMessage>
                {
                    new(ChatRole.System, "You are correcting prior invalid JSON output."),
                    new(ChatRole.User, $"Previous output:\n{raw}\n\nErrors:\n{string.Join('\n', v.Errors)}\n\nReturn corrected JSON only.")
                };
            }
            catch (JsonException ex)
            {
                _log.LogWarning(ex, "JSON parse failed (attempt {N})", attempt);
                convo = new List<ChatMessage>
                {
                    new(ChatRole.User, $"Previous output was not valid JSON: {raw}\nReturn JSON only.")
                };
            }
        }
        throw new InvalidOperationException($"Structured output failed after {MaxAttempts} attempts");
    }
}

❌ Wrong — silent swallow

try { var c = JsonSerializer.Deserialize<Customer>(resp.Text); return c; }
catch { return null; }   // production now has nulls floating around with no log

❌ Wrong — unbounded retry

while (true)             // tokens burn forever; one stuck prompt blows your budget
{
    var resp = await chat.GetResponseAsync(...);
    if (Validate(resp)) return resp;
}

❌ Wrong — full history on retry

// every retry doubles the conversation; cost explodes
convo.Add(new ChatMessage(ChatRole.User, "Try again, your output was wrong"));

The model has no idea what was wrong. Tell it.

✅ Correct — registered as middleware

chat = chat.AsBuilder()
    .UseFunctionInvocation()
    .Use(inner => new StructuredOutputClient(inner, validator))
    .UseLogging(loggerFactory)
    .Build();

Design patterns for this topic

Pattern 1 — "Strict mode first, validate second"

  • Intent: belt + suspenders. Constrain at the decoder; validate after.
  • Why: strict mode prevents structural errors; validation catches semantic ones.

Pattern 2 — "Bounded retry with delta-only context"

  • Intent: give the model a chance to fix its mistake without burning tokens.
  • Cap: 2–3 attempts. Past that, escalate.

Pattern 3 — "Tool-as-schema" (Anthropic)

  • Intent: force structured output on providers without a strict mode.
  • Mechanism: define a tool that receives the structured object; require call.

Pattern 4 — "Schema as DTO"

  • Intent: one C# record is the source of truth — schema, validation, deserialization target.
  • Mechanism: annotate with [Description], [Range], EmailAddress; generate schema; deserialize.

Pattern 5 — "Validation telemetry"

  • Intent: every validation failure becomes a metric/log so you can spot prompt drift.
  • Mechanism: counter ai.structured_output.validation_failures tagged by model, schema_name, error_kind.

Pattern 6 — "Layered validation"

  • Intent: schema first (cheap), semantic second (DB/external).
  • Why: fail fast on structural errors before doing expensive lookups.

Pros & cons / trade-offs

Strategy Pros Cons
Strict mode Mathematically correct OpenAI-only at full strength; some schema features unsupported
Tool-as-schema Works on Claude One extra round trip's worth of tokens
Post-hoc validation Provider-agnostic Costs a retry on failure
FluentValidation Expressive business rules Not in schema; model can't see rules unless restated in prompt
Grammar-constrained sampling Strongest guarantee Slightly slower; can degrade reasoning
Free-form + try/catch Trivial Brittle; bad UX on failure

When to use / when to avoid

  • Use strict mode whenever the provider supports it. No reason not to.
  • Use validation middleware in every production agent pipeline.
  • Use retry-with-feedback for high-value calls where one extra round trip is cheap relative to a downstream failure.
  • Avoid retry chains for streaming responses where partial output is already consumed.
  • Avoid unconstrained free-form JSON parsing in production. Always.
  • Avoid schema features the provider doesn't support in strict mode (oneOf with discriminator, recursive refs sometimes); flatten the schema.

Interview Q&A

Q1. What does OpenAI's strict JSON Schema mode actually do? Compiles the schema to a grammar; masks tokens during sampling so the model cannot emit invalid JSON. Structural correctness is mathematically guaranteed; semantic correctness is not.

Q2. Why validate even with strict mode? Schema cannot express every rule (cross-field, business constraints). Strict mode is provider-specific — you may switch. Drift detection — alerts on validation failures show prompt regressions.

Q3. Difference between schema validation and FluentValidation? Schema = structure (types, required, enums, ranges). FluentValidation = semantic rules (cross-field, DB lookups, business policy).

Q4. How do you do structured output on Claude? Define a tool whose parameter is the desired shape; force call via ToolMode.RequireSpecific. Schema is enforced because it's a function call.

Q5. Retry budget for invalid output? 2–3 attempts. Past that, fail loudly. Unbounded retry hides bugs and burns tokens.

Q6. What do you send on retry? Only the bad output + the validation errors + the schema. Not the full conversation. The model needs the delta.

Q7. How is JSON Schema generated from a C# type? AIJsonUtilities.CreateJsonSchema(typeof(T)) reflects the type, walks [Description], [Required], [Range], [EmailAddress], enums; emits JSON Schema.

Q8. What's grammar-constrained decoding? Each token's probability is masked against a grammar before sampling. Tokens that would break parse have probability zero. Used by llama.cpp GBNF and OpenAI strict mode internally.

Q9. Where does validation sit in the IChatClient pipeline? In a DelegatingChatClient — outside the provider call so it can re-invoke on failure.

Q10. Anti-pattern: catching JsonException and returning null. Silently hides failures. Log, increment metric, retry, then escalate.

Q11. How do you observe validation failures? Counter ai.structured_output.validation_failures tagged by model + schema + error type. Alert when rate exceeds threshold; dashboards by model.

Q12. Does strict mode work with additionalProperties? OpenAI strict requires additionalProperties: false. Closed schemas only. Open shapes need fallback to non-strict + validation.


Gotchas / common mistakes

  • ⚠️ Strict mode requires additionalProperties: false on every object. Forgetting this rejects the schema.
  • ⚠️ Recursive types (a tree) don't always work in strict mode. Flatten or paginate.
  • ⚠️ JsonSerializer.Deserialize<T> is case-sensitive by default. Configure PropertyNameCaseInsensitive = true or align casing in schema.
  • ⚠️ Sending the full chat history on retry doubles cost each attempt.
  • ⚠️ Asking the model to "try again" without telling it what failed → it produces the same garbage.
  • ⚠️ Mixing prose + JSON in the same response. Force JSON-only with response_format or strip with a regex (last resort).
  • ⚠️ Floating-point in schema as number — you may receive 1 (int). Use JsonNumberHandling.AllowReadingFromString or accept either.
  • ⚠️ Streaming + structured output don't mix cleanly: you can't validate until the stream completes.
  • ⚠️ Provider-specific schema dialect drift. JSON Schema 2020-12 vs 7 — some keywords differ.
  • ⚠️ Treating validation as a development concern instead of a production telemetry signal. Log every failure.

Further reading