You want your LLM to return a JSON object with three fields. Name, email, score. Not prose. Not markdown. Just the JSON.
Every major provider now supports this. They all call it structured output. They all use constrained decoding under the hood. And they all implement it with completely different API parameters, different schema restrictions, and different edge cases that show up when you switch models.
We decided to test this properly. We ran structured output requests against 244 models across 23 providers, using 3 API endpoints and 10 popular framework SDKs. Over 2,400 individual tests. Here is everything we found.
JSON mode vs structured output
Before the results, one important distinction. These are two different features that often get confused.
JSON mode (response_format: {type: "json_object"}): The model returns valid JSON, but there is no schema enforcement. It might return {"answer": 4} or {"result": {"value": 4, "unit": "count"}} or something else entirely. Your code has to handle whatever shape comes back. Almost every model supports this.
Structured output (response_format: {type: "json_schema", json_schema: {...}}): The model is constrained to match your exact schema. If you define {championships: int, summary: string}, that is exactly what you get. Every field, every type, every required property. This uses constrained decoding at the token generation level. Fewer providers support this because it requires deeper infrastructure changes.
Everything in this post is about structured output. The strict json_schema mode. That is what matters for production applications where you need guaranteed schema conformance.
How we tested
Every test sends the same structured output request: a JSON Schema requiring an integer field and a string field, with strict: true and additionalProperties: false. The model must return a response that matches the schema exactly. We parse and validate every response.
We tested across three axes:
- Three API endpoints: OpenAI Chat Completions (
/v1/chat/completions), OpenAI Responses (/v1/responses), and Anthropic Messages (/v1/messages) - Ten framework SDKs: Instructor, LangChain, LlamaIndex, LiteLLM, PydanticAI, DSPy, Haystack, Agno, Mirascope, and Vercel AI SDK
- 244 models across 23 providers: OpenAI, Anthropic, Google, Azure, Bedrock, Vertex, Mistral, xAI, DeepSeek, DeepInfra, Fireworks, Groq, Alibaba (Qwen), MiniMaxi, Moonshot, Nebius, Novita, Parasail, Perplexity, Together, Inceptron, zai, and more
All tests ran through Requesty's gateway at https://router.requesty.ai/v1. Every request used the same API key, the same schema, the same endpoint format. The only variable was the model.
The API fragmentation problem
Before looking at results, here is why testing this matters. Every provider implements structured output with a different parameter name and a different request shape.
OpenAI Chat Completions
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Extract the user info"}],
response_format={
"type": "json_schema",
"json_schema": {
"name": "user_info",
"strict": True,
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"email": {"type": "string"},
"score": {"type": "integer"}
},
"required": ["name", "email", "score"],
"additionalProperties": False
}
}
}
)OpenAI Responses API
response = client.responses.create(
model="gpt-4o",
input="Extract the user info",
text={
"format": {
"type": "json_schema",
"name": "user_info",
"strict": True,
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"email": {"type": "string"},
"score": {"type": "integer"}
},
"required": ["name", "email", "score"],
"additionalProperties": False
}
}
}
)Same provider, different API. The parameter moved from response_format to text.format. The nesting structure changed completely.
Anthropic Messages
response = client.messages.create(
model="claude-sonnet-4-5-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Extract the user info"}],
output_config={
"format": {
"type": "json_schema",
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"email": {"type": "string"},
"score": {"type": "integer"}
},
"required": ["name", "email", "score"],
"additionalProperties": False
}
}
}
)No response_format. The parameter is output_config.format. No name field on the schema wrapper. Completely different shape.
Google Gemini
response = client.models.generate_content(
model="gemini-2.5-pro",
contents="Extract the user info",
config={
"response_mime_type": "application/json",
"response_schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"email": {"type": "string"},
"score": {"type": "integer"}
},
"required": ["name", "email", "score"]
}
}
)Yet another parameter name: response_schema plus response_mime_type. Four providers, four different shapes for the exact same feature.
The compatibility matrix
| Feature | OpenAI (Chat) | OpenAI (Responses) | Anthropic | |
|---|---|---|---|---|
| Parameter name | response_format | text.format | output_config.format | response_schema |
| Schema type | json_schema | json_schema | json_schema | OpenAPI subset / JSON Schema |
| Strict mode | strict: true | strict: true | Always strict | Always strict |
| Requires name | Yes | Yes | No | No |
| additionalProperties | Must be false | Must be false | Must be false | Not required |
| Recursive schemas | Supported | Supported | Not supported | Supported |
| anyOf | Supported | Supported | Supported (limited) | Supported |
| JSON mode (no schema) | json_object | json_object | Not supported | application/json only |
Results: native API endpoints
Here is how every provider performed when tested directly with the OpenAI and Anthropic SDKs.
| Provider | Models | Chat Completions | Responses | Messages |
|---|---|---|---|---|
| Alibaba (Qwen) | 7 | 7/7 | 7/7 | 7/7 |
| Anthropic | 9 | 9/9 | 9/9 | 9/9 |
| Azure | 8 | 6/6 | 8/8 | 8/8 |
| Bedrock | 7 | 7/7 | 7/7 | 7/7 |
| DeepInfra | 20 | 15/20 | 15/20 | 16/20 |
| DeepSeek | 4 | 0/4 | 0/4 | 4/4 |
| Fireworks | 7 | 7/7 | 7/7 | 7/7 |
| 9 | 7/9 | 7/9 | 7/9 | |
| Groq | 2 | 2/2 | 2/2 | 2/2 |
| Inceptron | 3 | 3/3 | 3/3 | 3/3 |
| MiniMaxi | 5 | 5/5 | 5/5 | 5/5 |
| Mistral | 13 | 13/13 | 13/13 | 13/13 |
| Moonshot | 7 | 7/7 | 7/7 | 7/7 |
| Nebius | 7 | 7/7 | 7/7 | 7/7 |
| Novita | 35 | 11/35 | 11/35 | 28/35 |
| OpenAI | 28 | 24/28 | 24/28 | 25/28 |
| OpenAI Responses | 23 | n/a | 23/23 | 23/23 |
| Parasail | 5 | 4/5 | 5/5 | 5/5 |
| Perplexity | 3 | 3/3 | 3/3 | 3/3 |
| Together | 4 | 3/4 | 3/4 | 3/4 |
| Vertex | 23 | 20/23 | 20/23 | 22/23 |
| xAI | 10 | 10/10 | 10/10 | 10/10 |
| zai | 5 | 4/5 | 5/5 | 5/5 |
Summary by endpoint
| Endpoint | Pass | Tested | Rate |
|---|---|---|---|
| Anthropic Messages | 226 | 244 | 93% |
| OpenAI Responses | 201 | 244 | 82% |
| OpenAI Chat Completions | 174 | 219 | 80% |
The Anthropic Messages endpoint has the highest pass rate because Requesty translates json_schema into Anthropic's native constrained decoding format, and that translation is clean. For OpenAI-compatible endpoints, Requesty passes json_schema through to the upstream provider. Providers that support it natively pass. Providers that have not implemented it yet return an error.
199 models pass all native endpoints
Out of 244 models tested, 199 passed structured output on every endpoint where they were tested. That is 82% of all models with full native endpoint coverage. These models return schema-conforming JSON regardless of whether you call them via Chat Completions, Responses, or Messages.
Providers with 100% pass rate
14 out of 23 providers passed structured output on every model and every endpoint:
Alibaba (Qwen), Anthropic, Azure, Bedrock, Fireworks, Groq, Inceptron, MiniMaxi, Mistral, Moonshot, Nebius, OpenAI Responses, Perplexity, xAI.
Results: SDK compatibility
We tested 10 popular framework SDKs against 159 models (recent models released in the last 6 months). Every SDK sends structured output requests through Requesty's OpenAI-compatible API.
| SDK | Pass | Tested | Rate |
|---|---|---|---|
| DSPy | 144 | 159 | 91% |
| Agno (Anthropic mode) | 144 | 159 | 91% |
| LiteLLM | 143 | 159 | 90% |
| LangChain | 142 | 159 | 89% |
| Haystack | 142 | 159 | 89% |
| LlamaIndex | 139 | 159 | 87% |
| PydanticAI | 137 | 159 | 86% |
| Instructor | 128 | 159 | 81% |
| Mirascope | 127 | 159 | 80% |
| Vercel AI SDK | 120 | 159 | 75% |
All 10 SDKs work with structured outputs through Requesty. The pass rate differences come from how each SDK formats the request and parses the response.
DSPy, LiteLLM, LangChain, and Haystack have the highest pass rates (89-91%) because they send clean OpenAI-compatible requests and parse responses without extra transformation.
Instructor and Mirascope are slightly lower because they add retry/validation logic that can trip on certain provider error formats.
Vercel AI SDK has the lowest pass rate at 75% because some interactions with the Responses endpoint require specific response fields that not all providers return.
SDK compatibility by provider
Here is how each provider performed across all 10 framework SDKs. This shows which providers have the broadest SDK compatibility.
| Provider | Instructor | LangChain | LlamaIndex | LiteLLM | PydanticAI | DSPy | Haystack | Mirascope | Agno | Vercel |
|---|---|---|---|---|---|---|---|---|---|---|
| Alibaba | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | - |
| Anthropic | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 88% |
| Azure | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% |
| Bedrock | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 71% | 85% |
| DeepInfra | 85% | 85% | 85% | 100% | 100% | 100% | 85% | 85% | 100% | 57% |
| Fireworks | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 85% | 100% |
| 50% | 75% | 75% | 75% | 75% | 87% | 75% | 62% | 75% | 75% | |
| Groq | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 50% | 100% | 100% |
| MiniMaxi | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | - |
| Mistral | 83% | 100% | 100% | 100% | 100% | 100% | 100% | 83% | 100% | 83% |
| Moonshot | 71% | 100% | 71% | 100% | 71% | 100% | 100% | 71% | 100% | 100% |
| Nebius | 71% | 100% | 100% | 100% | 71% | 100% | 100% | 71% | 100% | 71% |
| OpenAI | 89% | 85% | 85% | 85% | 89% | 64% | 85% | 89% | 89% | 85% |
| Parasail | 80% | 100% | 100% | 100% | 80% | 100% | 100% | 80% | 100% | 100% |
| Perplexity | - | 100% | 100% | 100% | - | 100% | 100% | - | 100% | 100% |
| Together | 50% | 75% | 75% | 75% | 75% | 75% | 75% | 50% | 50% | 50% |
| Vertex | 71% | 85% | 80% | 85% | 85% | 100% | 85% | 66% | 85% | 85% |
| xAI | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 88% | 100% |
| zai | 60% | 100% | 100% | 100% | 100% | 100% | 100% | 60% | 100% | - |
Key takeaway: Alibaba, Anthropic, MiniMaxi, Fireworks, and xAI have near-perfect SDK compatibility across the board. If you are building with multiple SDKs and want maximum compatibility, these providers are the safest choices.
The 43 models that pass everything
These models returned correct structured output across all 3 endpoints and all 10 framework SDKs. Zero failures out of 14 tests per model. If you want maximum compatibility, start here.
| Provider | Models |
|---|---|
| OpenAI | gpt-5.5, gpt-5.4, gpt-5.4-mini, gpt-5.3, gpt-5.2, gpt-5.1, gpt-5, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, gpt-4o, gpt-4o-mini |
| Anthropic | claude-opus-4-6, claude-opus-4-5, claude-opus-4-1, claude-opus-4, claude-sonnet-4-6, claude-sonnet-4-5, claude-sonnet-4, claude-haiku-4-5 |
| Vertex | claude-opus-4-6, claude-opus-4-5, gemini-3.5-flash, gemini-2.5-flash-lite |
| xAI | grok-4, grok-4-1-fast-non-reasoning, grok-3-mini |
| Azure | gpt-4.1, gpt-4.1-nano |
| Bedrock | claude-sonnet-4-6, claude-sonnet-4-5 |
| Fireworks | minimax-m2.5, qwen3.6-plus |
| Parasail | gemma-4-26B, qwen25-vl-72b, qwen3-235b |
| gemini-2.5-flash | |
| Groq | gpt-oss-20b |
| Together | Llama-3.3-70B-Instruct-Turbo |
42 more models at 13/14
An additional 42 models pass 13 out of 14 SDK tests. The only failure is Agno in OpenAI Chat mode, which is a known SDK-level issue (Agno sends a developer role that some providers reject). These models include:
Mistral: mistral-large-latest, mistral-medium-latest, mistral-small-latest, mistral-small-2603
Moonshot: kimi-k2.5, kimi-k2.6, kimi-k2-thinking, kimi-k2-thinking-turbo, kimi-k2-turbo-preview
Nebius: DeepSeek-V3.2, Llama-3.3-70B, nemotron-3-nano-omni, gpt-oss-120b, glm-5.1
DeepInfra: Qwen3-235B, Qwen3-Coder-480B, DeepSeek-V3, DeepSeek-V3.1
Vertex: claude-haiku-4-5, claude-sonnet-4-5, claude-sonnet-4-6, gemini-2.5-flash, gemini-2.5-pro
xAI: grok-3, grok-4-fast, grok-4.2-beta, grok-4.3
That brings the total to 85 models with near-perfect structured output compatibility across every SDK.
Where structured output does not work (and why)
Not every model supports json_schema. Here is a breakdown of what fails and why.
Provider does not support json_schema
Some providers have not implemented constrained decoding for json_schema at the API level. These providers typically support json_object (JSON mode) but not schema enforcement.
| Provider | Models affected | Status |
|---|---|---|
| DeepSeek | deepseek-chat, deepseek-v4-flash, deepseek-v4-pro, deepseek-reasoner | JSON mode works. json_schema returns "This response_format type is unavailable now" |
| Novita (most models) | 24 of 35 models | Provider returns Bad Request for json_schema. A few larger models (deepseek-v3.2, gemma-4, llama-3-70b) do work |
For DeepSeek specifically: if you need structured output, route through the Anthropic Messages endpoint via Requesty. Requesty translates the schema into a format DeepSeek can handle, and all 4 DeepSeek models pass on the Messages endpoint.
Model returned invalid JSON
Some models accept the json_schema parameter but return JSON that does not match the schema, or return empty/malformed responses.
| Provider | Models affected | Details |
|---|---|---|
| DeepInfra | Qwen2.5-Coder-32B, Qwen3-32B, DeepSeek-R1, DeepSeek-R1-Distill-Llama-70B | Smaller/distilled models that struggle with strict schema adherence |
| Novita | deepseek-r1-distill-* variants | Same issue with distilled models |
Image generation models
Image generation models (vertex/gemini-2.5-flash-image, vertex/gemini-3-pro-image-preview, google/gemini-3.1-flash-image-preview) do not support structured text output. This is expected. These models are designed for image generation, not text extraction.
Responses-only models
OpenAI's newest reasoning models (gpt-5.4-pro, gpt-5.5-pro) only work on the /v1/responses endpoint. They are not available on Chat Completions. This is an OpenAI platform constraint, not a structured output issue. Through Requesty, these models work perfectly on the Responses endpoint.
The subtle breaks to watch for
Beyond provider-level support, there are schema-level differences that cause silent failures when switching providers.
Recursive schemas
If you are extracting nested data (a file tree, a comment thread, a recursive category structure), OpenAI and Google handle recursive $ref in schemas. Anthropic does not. Your schema compiles fine on OpenAI. You switch to Claude. You get a schema compilation error at request time, not at schema validation time.
The name field
OpenAI requires a name field on the schema wrapper (both in Chat Completions and Responses, though in different positions). Anthropic does not accept one. Google does not need one. If you have a shared schema definition that includes name, it works on OpenAI and breaks on Anthropic.
Strict mode semantics
On OpenAI, you opt into strict enforcement with strict: true. Without it, the model will try to follow your schema but is not guaranteed to. On Anthropic, all structured output is strict by default. There is no non-strict mode. On Google, it is also always strict.
additionalProperties
Every provider requires additionalProperties: false for strict schemas, but they fail differently when you forget it. OpenAI returns a clear error. Anthropic returns a schema compilation error. Google silently allows extra fields in some SDK versions but blocks them in others.
The Anthropic beta migration
Anthropic shipped structured outputs in November 2025 as a beta with output_format and a required header. The GA release in 2026 moved it to output_config.format with no header required. Both work today. But if you followed a tutorial from six months ago, you are using the beta shape. When the transition period ends, your code breaks.
How Requesty handles all of this
When you route through Requesty, you send a single OpenAI-compatible request. One parameter format. One schema shape. Requesty handles the translation.
from openai import OpenAI
client = OpenAI(
base_url="https://router.requesty.ai/v1",
api_key="your_requesty_key"
)
response = client.chat.completions.create(
model="anthropic/claude-sonnet-4-5", # or any of 244 models
messages=[{"role": "user", "content": "Extract the user info"}],
response_format={
"type": "json_schema",
"json_schema": {
"name": "user_info",
"strict": True,
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"email": {"type": "string"},
"score": {"type": "integer"}
},
"required": ["name", "email", "score"],
"additionalProperties": False
}
}
}
)What Requesty does under the hood:
- OpenAI models:
response_formatpassed through as-is - Anthropic models: translated to
output_config.format,namefield stripped, schema validated against Anthropic's restrictions - Google/Vertex models: translated to
response_schemawithresponse_mime_typeset - All other providers: translated to whatever format the provider expects
When you switch model from openai/gpt-5.1 to anthropic/claude-sonnet-4-5 to vertex/gemini-2.5-flash, your structured output configuration does not change. When your fallback policy routes a failed request from one provider to another, the schema translation happens automatically.
Every SDK works through one base URL
All 10 SDKs we tested point at https://router.requesty.ai/v1 as the base URL. No SDK-specific configuration. No per-provider adapters.
# Instructor
client = instructor.from_openai(OpenAI(base_url="https://router.requesty.ai/v1"))
result = client.chat.completions.create(model="anthropic/claude-sonnet-4-5", response_model=UserInfo, ...)
# LangChain
llm = ChatOpenAI(model="vertex/gemini-2.5-flash", base_url="https://router.requesty.ai/v1")
chain = llm.with_structured_output(UserInfo)
# PydanticAI
model = OpenAIModel("openai/gpt-5.1", provider=OpenAIProvider(base_url="https://router.requesty.ai/v1"))
agent = Agent(model, output_type=UserInfo)
# LiteLLM
response = await litellm.acompletion(model="xai/grok-4", base_url="https://router.requesty.ai/v1", ...)
# DSPy
lm = dspy.LM("openai/mistral/mistral-large-latest", api_base="https://router.requesty.ai/v1")
# Haystack
generator = OpenAIChatGenerator(model="fireworks/minimax-m2.5", api_base_url="https://router.requesty.ai/v1")
# Vercel AI SDK (TypeScript)
const provider = createOpenAI({ baseURL: "https://router.requesty.ai/v1" })
const result = await generateObject({ model: provider("anthropic/claude-sonnet-4-5"), schema: z.object({...}) })Same pattern for LlamaIndex, Mirascope, and Agno. One integration, 244 models, 23 providers.
Practical recommendations
Always set additionalProperties: false. Every provider requires it for strict schemas. Make it a default in your schema definitions.
Avoid recursive schemas if you need provider portability. Anthropic does not support them. If your data is naturally recursive, flatten it to a fixed depth in the schema and handle deeper nesting in your application code.
Do not rely on name being present or absent. OpenAI requires it, Anthropic rejects it, Google ignores it. If you need to support multiple providers without a gateway, conditionally add or strip the name field based on the target.
Test schema compilation, not just schema correctness. A schema can be valid JSON Schema but fail compilation on a specific provider due to unsupported keywords, nesting depth, or complexity limits. Anthropic has a 180-second compilation timeout and a limit of 24 optional parameters across all schemas in a request.
Pick models from the 100% compatibility list. If your application depends on structured output working reliably across SDKs and endpoints, start with the 43 models that pass every test. That list covers all the major model families: GPT-5.x, Claude 4.x, Gemini 2.5+, Grok 4, and several open-source models.
Use a gateway. If you are calling more than one provider, or might need to in the future, normalizing the structured output parameter at the gateway level saves you from maintaining translation code for every provider. Requesty gives you $10 free to test this with your existing schemas across all 244 models.
Methodology and updates
All tests were run in May 2026. We will keep running these tests as providers ship updates. The test code is open and runs as part of our CI/CD pipeline, so compatibility data stays current.
Your JSON schema should describe your data, not your provider.
Frequently asked questions
- What is structured output in LLM APIs?
- Structured output is a feature where the LLM provider uses constrained decoding to guarantee that the model's response matches a JSON Schema you provide. Unlike prompting the model to return JSON (which can fail), structured output enforces the schema at the token generation level. The model literally cannot produce tokens that would violate your schema. Every major provider now supports some form of this, but they all use different parameter names and API shapes.
- What is the difference between JSON mode and structured output?
- JSON mode (response_format type json_object) guarantees syntactically valid JSON but does not enforce a schema. The model might return any valid JSON object with any fields and types. Structured output (response_format type json_schema) goes further by enforcing a specific JSON Schema, guaranteeing correct field names, types, required properties, and nesting. If you are shipping to production, you almost always want structured output over JSON mode.
- Which LLM providers support structured outputs in 2026?
- All major providers support structured outputs: OpenAI, Anthropic, Google, Azure, Bedrock, Vertex, Mistral, xAI, Fireworks, Groq, and more. We tested 244 models across 23 providers. Some smaller providers like DeepSeek support JSON mode but not full json_schema enforcement. The full compatibility results are in this post.
- Which SDKs work with structured outputs through Requesty?
- We tested 10 popular SDKs: Instructor, LangChain, LlamaIndex, LiteLLM, PydanticAI, DSPy, Haystack, Agno, Mirascope, and Vercel AI SDK. All of them work with structured outputs through Requesty. Pass rates range from 75% to 91% across all models, with most failures isolated to a few providers that do not support json_schema at the API level.
- How does Requesty handle structured output across providers?
- Requesty normalizes structured output parameters at the gateway level. You send a single OpenAI compatible request with response_format and Requesty translates it to whatever parameter the downstream provider expects, whether that is output_config for Anthropic, responseSchema for Google, or the native format for any other provider. When you switch models or failover between providers, your structured output configuration stays the same.
- JAN '25
Switching LLM Providers: Why It’s Harder Than It Seems
- MAY '26
Building Production AI Agents in 2026: The Complete SDK Guide
A hands on guide to the three major agent SDKs of 2026: Claude Agent SDK, OpenAI Agents SDK, and Google ADK. Learn how each one works, when to pick it, and how to route all of them through a unified AI gateway for cost tracking, failover, and observability.

