What is structured output in LLM APIs?

Structured output is a feature where the LLM provider uses constrained decoding to guarantee that the model's response matches a JSON Schema you provide. Unlike prompting the model to return JSON (which can fail), structured output enforces the schema at the token generation level. The model literally cannot produce tokens that would violate your schema. Every major provider now supports some form of this, but they all use different parameter names and API shapes.

What is the difference between JSON mode and structured output?

JSON mode (response_format type json_object) guarantees syntactically valid JSON but does not enforce a schema. The model might return any valid JSON object with any fields and types. Structured output (response_format type json_schema) goes further by enforcing a specific JSON Schema, guaranteeing correct field names, types, required properties, and nesting. If you are shipping to production, you almost always want structured output over JSON mode.

Which LLM providers support structured outputs in 2026?

All major providers support structured outputs: OpenAI, Anthropic, Google, Azure, Bedrock, Vertex, Mistral, xAI, Fireworks, Groq, and more. We tested 244 models across 23 providers. Some smaller providers like DeepSeek support JSON mode but not full json_schema enforcement. The full compatibility results are in this post.

Which SDKs work with structured outputs through Requesty?

We tested 10 popular SDKs: Instructor, LangChain, LlamaIndex, LiteLLM, PydanticAI, DSPy, Haystack, Agno, Mirascope, and Vercel AI SDK. All of them work with structured outputs through Requesty. Pass rates range from 75% to 91% across all models, with most failures isolated to a few providers that do not support json_schema at the API level.

How does Requesty handle structured output across providers?

Requesty normalizes structured output parameters at the gateway level. You send a single OpenAI compatible request with response_format and Requesty translates it to whatever parameter the downstream provider expects, whether that is output_config for Anthropic, responseSchema for Google, or the native format for any other provider. When you switch models or failover between providers, your structured output configuration stays the same.

Structured Outputs Across LLM Providers: 244 Models Tested (2026)

You want your LLM to return a JSON object with three fields. Name, email, score. Not prose. Not markdown. Just the JSON.

Every major provider now supports this. They all call it structured output. They all use constrained decoding under the hood. And they all implement it with completely different API parameters, different schema restrictions, and different edge cases that show up when you switch models.

We decided to test this properly. We ran structured output requests against 244 models across 23 providers, using 3 API endpoints and 10 popular framework SDKs. Over 2,400 individual tests. Here is everything we found.

JSON mode vs structured output

Before the results, one important distinction. These are two different features that often get confused.

JSON mode (response_format: {type: "json_object"}): The model returns valid JSON, but there is no schema enforcement. It might return {"answer": 4} or {"result": {"value": 4, "unit": "count"}} or something else entirely. Your code has to handle whatever shape comes back. Almost every model supports this.

Structured output (response_format: {type: "json_schema", json_schema: {...}}): The model is constrained to match your exact schema. If you define {championships: int, summary: string}, that is exactly what you get. Every field, every type, every required property. This uses constrained decoding at the token generation level. Fewer providers support this because it requires deeper infrastructure changes.

Everything in this post is about structured output. The strict json_schema mode. That is what matters for production applications where you need guaranteed schema conformance.

How we tested

Every test sends the same structured output request: a JSON Schema requiring an integer field and a string field, with strict: true and additionalProperties: false. The model must return a response that matches the schema exactly. We parse and validate every response.

We tested across three axes:

Three API endpoints: OpenAI Chat Completions (/v1/chat/completions), OpenAI Responses (/v1/responses), and Anthropic Messages (/v1/messages)
Ten framework SDKs: Instructor, LangChain, LlamaIndex, LiteLLM, PydanticAI, DSPy, Haystack, Agno, Mirascope, and Vercel AI SDK
244 models across 23 providers: OpenAI, Anthropic, Google, Azure, Bedrock, Vertex, Mistral, xAI, DeepSeek, DeepInfra, Fireworks, Groq, Alibaba (Qwen), MiniMaxi, Moonshot, Nebius, Novita, Parasail, Perplexity, Together, Inceptron, zai, and more

All tests ran through Requesty's gateway at https://router.requesty.ai/v1. Every request used the same API key, the same schema, the same endpoint format. The only variable was the model.

The API fragmentation problem

Before looking at results, here is why testing this matters. Every provider implements structured output with a different parameter name and a different request shape.

OpenAI Chat Completions

Python

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Extract the user info"}],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "user_info",
            "strict": True,
            "schema": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "email": {"type": "string"},
                    "score": {"type": "integer"}
                },
                "required": ["name", "email", "score"],
                "additionalProperties": False
            }
        }
    }
)

OpenAI Responses API

Python

response = client.responses.create(
    model="gpt-4o",
    input="Extract the user info",
    text={
        "format": {
            "type": "json_schema",
            "name": "user_info",
            "strict": True,
            "schema": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "email": {"type": "string"},
                    "score": {"type": "integer"}
                },
                "required": ["name", "email", "score"],
                "additionalProperties": False
            }
        }
    }
)

Same provider, different API. The parameter moved from response_format to text.format. The nesting structure changed completely.

Anthropic Messages

Python

response = client.messages.create(
    model="claude-sonnet-4-5-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Extract the user info"}],
    output_config={
        "format": {
            "type": "json_schema",
            "schema": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "email": {"type": "string"},
                    "score": {"type": "integer"}
                },
                "required": ["name", "email", "score"],
                "additionalProperties": False
            }
        }
    }
)

No response_format. The parameter is output_config.format. No name field on the schema wrapper. Completely different shape.

Google Gemini

Python

response = client.models.generate_content(
    model="gemini-2.5-pro",
    contents="Extract the user info",
    config={
        "response_mime_type": "application/json",
        "response_schema": {
            "type": "object",
            "properties": {
                "name": {"type": "string"},
                "email": {"type": "string"},
                "score": {"type": "integer"}
            },
            "required": ["name", "email", "score"]
        }
    }
)

Yet another parameter name: response_schema plus response_mime_type. Four providers, four different shapes for the exact same feature.

The compatibility matrix

Feature	OpenAI (Chat)	OpenAI (Responses)	Anthropic	Google
Parameter name	`response_format`	`text.format`	`output_config.format`	`response_schema`
Schema type	`json_schema`	`json_schema`	`json_schema`	OpenAPI subset / JSON Schema
Strict mode	`strict: true`	`strict: true`	Always strict	Always strict
Requires name	Yes	Yes	No	No
additionalProperties	Must be `false`	Must be `false`	Must be `false`	Not required
Recursive schemas	Supported	Supported	Not supported	Supported
anyOf	Supported	Supported	Supported (limited)	Supported
JSON mode (no schema)	`json_object`	`json_object`	Not supported	`application/json` only

Results: native API endpoints

Here is how every provider performed when tested directly with the OpenAI and Anthropic SDKs.

Provider	Models	Chat Completions	Responses	Messages
Alibaba (Qwen)	7	7/7	7/7	7/7
Anthropic	9	9/9	9/9	9/9
Azure	8	6/6	8/8	8/8
Bedrock	7	7/7	7/7	7/7
DeepInfra	20	15/20	15/20	16/20
DeepSeek	4	0/4	0/4	4/4
Fireworks	7	7/7	7/7	7/7
Google	9	7/9	7/9	7/9
Groq	2	2/2	2/2	2/2
Inceptron	3	3/3	3/3	3/3
MiniMaxi	5	5/5	5/5	5/5
Mistral	13	13/13	13/13	13/13
Moonshot	7	7/7	7/7	7/7
Nebius	7	7/7	7/7	7/7
Novita	35	11/35	11/35	28/35
OpenAI	28	24/28	24/28	25/28
OpenAI Responses	23	n/a	23/23	23/23
Parasail	5	4/5	5/5	5/5
Perplexity	3	3/3	3/3	3/3
Together	4	3/4	3/4	3/4
Vertex	23	20/23	20/23	22/23
xAI	10	10/10	10/10	10/10
zai	5	4/5	5/5	5/5

Summary by endpoint

Endpoint	Pass	Tested	Rate
Anthropic Messages	226	244	93%
OpenAI Responses	201	244	82%
OpenAI Chat Completions	174	219	80%

The Anthropic Messages endpoint has the highest pass rate because Requesty translates json_schema into Anthropic's native constrained decoding format, and that translation is clean. For OpenAI-compatible endpoints, Requesty passes json_schema through to the upstream provider. Providers that support it natively pass. Providers that have not implemented it yet return an error.

199 models pass all native endpoints

Out of 244 models tested, 199 passed structured output on every endpoint where they were tested. That is 82% of all models with full native endpoint coverage. These models return schema-conforming JSON regardless of whether you call them via Chat Completions, Responses, or Messages.

Providers with 100% pass rate

14 out of 23 providers passed structured output on every model and every endpoint:

Alibaba (Qwen), Anthropic, Azure, Bedrock, Fireworks, Groq, Inceptron, MiniMaxi, Mistral, Moonshot, Nebius, OpenAI Responses, Perplexity, xAI.

Results: SDK compatibility

We tested 10 popular framework SDKs against 159 models (recent models released in the last 6 months). Every SDK sends structured output requests through Requesty's OpenAI-compatible API.

SDK	Pass	Tested	Rate
DSPy	144	159	91%
Agno (Anthropic mode)	144	159	91%
LiteLLM	143	159	90%
LangChain	142	159	89%
Haystack	142	159	89%
LlamaIndex	139	159	87%
PydanticAI	137	159	86%
Instructor	128	159	81%
Mirascope	127	159	80%
Vercel AI SDK	120	159	75%

All 10 SDKs work with structured outputs through Requesty. The pass rate differences come from how each SDK formats the request and parses the response.

DSPy, LiteLLM, LangChain, and Haystack have the highest pass rates (89-91%) because they send clean OpenAI-compatible requests and parse responses without extra transformation.

Instructor and Mirascope are slightly lower because they add retry/validation logic that can trip on certain provider error formats.

Vercel AI SDK has the lowest pass rate at 75% because some interactions with the Responses endpoint require specific response fields that not all providers return.

SDK compatibility by provider

Here is how each provider performed across all 10 framework SDKs. This shows which providers have the broadest SDK compatibility.

Provider	Instructor	LangChain	LlamaIndex	LiteLLM	PydanticAI	DSPy	Haystack	Mirascope	Agno	Vercel
Alibaba	100%	100%	100%	100%	100%	100%	100%	100%	100%	-
Anthropic	100%	100%	100%	100%	100%	100%	100%	100%	100%	88%
Azure	100%	100%	100%	100%	100%	50%	100%	100%	100%	100%
Bedrock	100%	100%	100%	100%	100%	100%	100%	100%	71%	85%
DeepInfra	85%	85%	85%	100%	100%	100%	85%	85%	100%	57%
Fireworks	100%	100%	100%	100%	100%	100%	100%	100%	85%	100%
Google	50%	75%	75%	75%	75%	87%	75%	62%	75%	75%
Groq	100%	100%	100%	100%	50%	100%	100%	50%	100%	100%
MiniMaxi	100%	100%	100%	100%	100%	100%	100%	100%	100%	-
Mistral	83%	100%	100%	100%	100%	100%	100%	83%	100%	83%
Moonshot	71%	100%	71%	100%	71%	100%	100%	71%	100%	100%
Nebius	71%	100%	100%	100%	71%	100%	100%	71%	100%	71%
OpenAI	89%	85%	85%	85%	89%	64%	85%	89%	89%	85%
Parasail	80%	100%	100%	100%	80%	100%	100%	80%	100%	100%
Perplexity	-	100%	100%	100%	-	100%	100%	-	100%	100%
Together	50%	75%	75%	75%	75%	75%	75%	50%	50%	50%
Vertex	71%	85%	80%	85%	85%	100%	85%	66%	85%	85%
xAI	100%	100%	100%	100%	100%	100%	100%	100%	88%	100%
zai	60%	100%	100%	100%	100%	100%	100%	60%	100%	-

Key takeaway: Alibaba, Anthropic, MiniMaxi, Fireworks, and xAI have near-perfect SDK compatibility across the board. If you are building with multiple SDKs and want maximum compatibility, these providers are the safest choices.

The 43 models that pass everything

These models returned correct structured output across all 3 endpoints and all 10 framework SDKs. Zero failures out of 14 tests per model. If you want maximum compatibility, start here.

Provider	Models
OpenAI	gpt-5.5, gpt-5.4, gpt-5.4-mini, gpt-5.3, gpt-5.2, gpt-5.1, gpt-5, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, gpt-4o, gpt-4o-mini
Anthropic	claude-opus-4-6, claude-opus-4-5, claude-opus-4-1, claude-opus-4, claude-sonnet-4-6, claude-sonnet-4-5, claude-sonnet-4, claude-haiku-4-5
Vertex	claude-opus-4-6, claude-opus-4-5, gemini-3.5-flash, gemini-2.5-flash-lite
xAI	grok-4, grok-4-1-fast-non-reasoning, grok-3-mini
Azure	gpt-4.1, gpt-4.1-nano
Bedrock	claude-sonnet-4-6, claude-sonnet-4-5
Fireworks	minimax-m2.5, qwen3.6-plus
Parasail	gemma-4-26B, qwen25-vl-72b, qwen3-235b
Google	gemini-2.5-flash
Groq	gpt-oss-20b
Together	Llama-3.3-70B-Instruct-Turbo

42 more models at 13/14

An additional 42 models pass 13 out of 14 SDK tests. The only failure is Agno in OpenAI Chat mode, which is a known SDK-level issue (Agno sends a developer role that some providers reject). These models include:

Mistral: mistral-large-latest, mistral-medium-latest, mistral-small-latest, mistral-small-2603

Moonshot: kimi-k2.5, kimi-k2.6, kimi-k2-thinking, kimi-k2-thinking-turbo, kimi-k2-turbo-preview

Nebius: DeepSeek-V3.2, Llama-3.3-70B, nemotron-3-nano-omni, gpt-oss-120b, glm-5.1

DeepInfra: Qwen3-235B, Qwen3-Coder-480B, DeepSeek-V3, DeepSeek-V3.1

Vertex: claude-haiku-4-5, claude-sonnet-4-5, claude-sonnet-4-6, gemini-2.5-flash, gemini-2.5-pro

xAI: grok-3, grok-4-fast, grok-4.2-beta, grok-4.3

That brings the total to 85 models with near-perfect structured output compatibility across every SDK.

Where structured output does not work (and why)

Not every model supports json_schema. Here is a breakdown of what fails and why.

Provider does not support json_schema

Some providers have not implemented constrained decoding for json_schema at the API level. These providers typically support json_object (JSON mode) but not schema enforcement.

Provider	Models affected	Status
DeepSeek	deepseek-chat, deepseek-v4-flash, deepseek-v4-pro, deepseek-reasoner	JSON mode works. `json_schema` returns "This response_format type is unavailable now"
Novita (most models)	24 of 35 models	Provider returns Bad Request for `json_schema`. A few larger models (deepseek-v3.2, gemma-4, llama-3-70b) do work

For DeepSeek specifically: if you need structured output, route through the Anthropic Messages endpoint via Requesty. Requesty translates the schema into a format DeepSeek can handle, and all 4 DeepSeek models pass on the Messages endpoint.

Model returned invalid JSON

Some models accept the json_schema parameter but return JSON that does not match the schema, or return empty/malformed responses.

Provider	Models affected	Details
DeepInfra	Qwen2.5-Coder-32B, Qwen3-32B, DeepSeek-R1, DeepSeek-R1-Distill-Llama-70B	Smaller/distilled models that struggle with strict schema adherence
Novita	deepseek-r1-distill-* variants	Same issue with distilled models

Image generation models

Image generation models (vertex/gemini-2.5-flash-image, vertex/gemini-3-pro-image-preview, google/gemini-3.1-flash-image-preview) do not support structured text output. This is expected. These models are designed for image generation, not text extraction.

Responses-only models

OpenAI's newest reasoning models (gpt-5.4-pro, gpt-5.5-pro) only work on the /v1/responses endpoint. They are not available on Chat Completions. This is an OpenAI platform constraint, not a structured output issue. Through Requesty, these models work perfectly on the Responses endpoint.

The subtle breaks to watch for

Beyond provider-level support, there are schema-level differences that cause silent failures when switching providers.

Recursive schemas

If you are extracting nested data (a file tree, a comment thread, a recursive category structure), OpenAI and Google handle recursive $ref in schemas. Anthropic does not. Your schema compiles fine on OpenAI. You switch to Claude. You get a schema compilation error at request time, not at schema validation time.

The name field

OpenAI requires a name field on the schema wrapper (both in Chat Completions and Responses, though in different positions). Anthropic does not accept one. Google does not need one. If you have a shared schema definition that includes name, it works on OpenAI and breaks on Anthropic.

Strict mode semantics

On OpenAI, you opt into strict enforcement with strict: true. Without it, the model will try to follow your schema but is not guaranteed to. On Anthropic, all structured output is strict by default. There is no non-strict mode. On Google, it is also always strict.

additionalProperties

Every provider requires additionalProperties: false for strict schemas, but they fail differently when you forget it. OpenAI returns a clear error. Anthropic returns a schema compilation error. Google silently allows extra fields in some SDK versions but blocks them in others.

The Anthropic beta migration

Anthropic shipped structured outputs in November 2025 as a beta with output_format and a required header. The GA release in 2026 moved it to output_config.format with no header required. Both work today. But if you followed a tutorial from six months ago, you are using the beta shape. When the transition period ends, your code breaks.

How Requesty handles all of this

When you route through Requesty, you send a single OpenAI-compatible request. One parameter format. One schema shape. Requesty handles the translation.

Python

from openai import OpenAI
 
client = OpenAI(
    base_url="https://router.requesty.ai/v1",
    api_key="your_requesty_key"
)
 
response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4-5",  # or any of 244 models
    messages=[{"role": "user", "content": "Extract the user info"}],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "user_info",
            "strict": True,
            "schema": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "email": {"type": "string"},
                    "score": {"type": "integer"}
                },
                "required": ["name", "email", "score"],
                "additionalProperties": False
            }
        }
    }
)

What Requesty does under the hood:

OpenAI models: response_format passed through as-is
Anthropic models: translated to output_config.format, name field stripped, schema validated against Anthropic's restrictions
Google/Vertex models: translated to response_schema with response_mime_type set
All other providers: translated to whatever format the provider expects

When you switch model from openai/gpt-5.1 to anthropic/claude-sonnet-4-5 to vertex/gemini-2.5-flash, your structured output configuration does not change. When your fallback policy routes a failed request from one provider to another, the schema translation happens automatically.

Every SDK works through one base URL

All 10 SDKs we tested point at https://router.requesty.ai/v1 as the base URL. No SDK-specific configuration. No per-provider adapters.

Python

# Instructor
client = instructor.from_openai(OpenAI(base_url="https://router.requesty.ai/v1"))
result = client.chat.completions.create(model="anthropic/claude-sonnet-4-5", response_model=UserInfo, ...)
 
# LangChain
llm = ChatOpenAI(model="vertex/gemini-2.5-flash", base_url="https://router.requesty.ai/v1")
chain = llm.with_structured_output(UserInfo)
 
# PydanticAI
model = OpenAIModel("openai/gpt-5.1", provider=OpenAIProvider(base_url="https://router.requesty.ai/v1"))
agent = Agent(model, output_type=UserInfo)
 
# LiteLLM
response = await litellm.acompletion(model="xai/grok-4", base_url="https://router.requesty.ai/v1", ...)
 
# DSPy
lm = dspy.LM("openai/mistral/mistral-large-latest", api_base="https://router.requesty.ai/v1")
 
# Haystack
generator = OpenAIChatGenerator(model="fireworks/minimax-m2.5", api_base_url="https://router.requesty.ai/v1")
 
# Vercel AI SDK (TypeScript)
const provider = createOpenAI({ baseURL: "https://router.requesty.ai/v1" })
const result = await generateObject({ model: provider("anthropic/claude-sonnet-4-5"), schema: z.object({...}) })

Same pattern for LlamaIndex, Mirascope, and Agno. One integration, 244 models, 23 providers.

Practical recommendations

Always set additionalProperties: false. Every provider requires it for strict schemas. Make it a default in your schema definitions.

Avoid recursive schemas if you need provider portability. Anthropic does not support them. If your data is naturally recursive, flatten it to a fixed depth in the schema and handle deeper nesting in your application code.

Do not rely on name being present or absent. OpenAI requires it, Anthropic rejects it, Google ignores it. If you need to support multiple providers without a gateway, conditionally add or strip the name field based on the target.

Test schema compilation, not just schema correctness. A schema can be valid JSON Schema but fail compilation on a specific provider due to unsupported keywords, nesting depth, or complexity limits. Anthropic has a 180-second compilation timeout and a limit of 24 optional parameters across all schemas in a request.

Pick models from the 100% compatibility list. If your application depends on structured output working reliably across SDKs and endpoints, start with the 43 models that pass every test. That list covers all the major model families: GPT-5.x, Claude 4.x, Gemini 2.5+, Grok 4, and several open-source models.

Use a gateway. If you are calling more than one provider, or might need to in the future, normalizing the structured output parameter at the gateway level saves you from maintaining translation code for every provider. Requesty gives you $10 free to test this with your existing schemas across all 244 models.

Methodology and updates

All tests were run in May 2026. We will keep running these tests as providers ship updates. The test code is open and runs as part of our CI/CD pipeline, so compatibility data stays current.

Your JSON schema should describe your data, not your provider.

Frequently asked questions

What is structured output in LLM APIs?: Structured output is a feature where the LLM provider uses constrained decoding to guarantee that the model's response matches a JSON Schema you provide. Unlike prompting the model to return JSON (which can fail), structured output enforces the schema at the token generation level. The model literally cannot produce tokens that would violate your schema. Every major provider now supports some form of this, but they all use different parameter names and API shapes.
What is the difference between JSON mode and structured output?: JSON mode (response_format type json_object) guarantees syntactically valid JSON but does not enforce a schema. The model might return any valid JSON object with any fields and types. Structured output (response_format type json_schema) goes further by enforcing a specific JSON Schema, guaranteeing correct field names, types, required properties, and nesting. If you are shipping to production, you almost always want structured output over JSON mode.
Which LLM providers support structured outputs in 2026?: All major providers support structured outputs: OpenAI, Anthropic, Google, Azure, Bedrock, Vertex, Mistral, xAI, Fireworks, Groq, and more. We tested 244 models across 23 providers. Some smaller providers like DeepSeek support JSON mode but not full json_schema enforcement. The full compatibility results are in this post.
Which SDKs work with structured outputs through Requesty?: We tested 10 popular SDKs: Instructor, LangChain, LlamaIndex, LiteLLM, PydanticAI, DSPy, Haystack, Agno, Mirascope, and Vercel AI SDK. All of them work with structured outputs through Requesty. Pass rates range from 75% to 91% across all models, with most failures isolated to a few providers that do not support json_schema at the API level.
How does Requesty handle structured output across providers?: Requesty normalizes structured output parameters at the gateway level. You send a single OpenAI compatible request with response_format and Requesty translates it to whatever parameter the downstream provider expects, whether that is output_config for Anthropic, responseSchema for Google, or the native format for any other provider. When you switch models or failover between providers, your structured output configuration stays the same.

Structured Outputs Across LLM Providers: 244 Models Tested (2026)

JSON mode vs structured output

How we tested

The API fragmentation problem

OpenAI Chat Completions

OpenAI Responses API

Anthropic Messages

Google Gemini

The compatibility matrix

Results: native API endpoints

Summary by endpoint

199 models pass all native endpoints

Providers with 100% pass rate

Results: SDK compatibility

SDK compatibility by provider

The 43 models that pass everything

42 more models at 13/14

Where structured output does not work (and why)

Provider does not support json_schema

Model returned invalid JSON

Image generation models

Responses-only models

The subtle breaks to watch for

Recursive schemas

The name field

Strict mode semantics

additionalProperties

The Anthropic beta migration

How Requesty handles all of this

Every SDK works through one base URL

Practical recommendations

Methodology and updates

Frequently asked questions

Switching LLM Providers: Why It’s Harder Than It Seems

Building Production AI Agents in 2026: The Complete SDK Guide