Microsoft Azure AI

Microsoft's enterprise AI services on Azure cloud platform.

📍 🇺🇸 US / 🇪🇺 EU28 models availableVisit Website →
28
Available Models
$0.95
Avg Input Price/M
$0.1
Cheapest Model
azure/gpt-4.1-nano@westus3
$2.00
Most Expensive
azure/gpt-4.1@uksouth

Features Overview

18
Vision Support
10
Advanced Reasoning
28
Caching Support
0
Computer Use

Privacy & Data Policy

Data Retention

No data retention

Location

🇺🇸 US / 🇪🇺 EU

All Microsoft Azure AI Models

View All Providers →
Microsoft Azure AI

o4-mini (westus3)

Caching
Reasoning
Context Window
200K tokens
Max Output
100K tokens
Input
$1.10/M tokens
Output
$4.40/M tokens

o3-mini is OpenAI's most recent small reasoning model, providing high intelligence at the same cost and latency targets of o1-mini. o3-mini also supports key developer features, like Structured Outputs, function calling, Batch API, and more. Like other models in the o-series, it is designed to excel at science, math, and coding tasks.

Microsoft Azure AI

o4-mini (francecentral)

Caching
Reasoning
Context Window
200K tokens
Max Output
100K tokens
Input
$1.10/M tokens
Output
$4.40/M tokens

o3-mini is OpenAI's most recent small reasoning model, providing high intelligence at the same cost and latency targets of o1-mini. o3-mini also supports key developer features, like Structured Outputs, function calling, Batch API, and more. Like other models in the o-series, it is designed to excel at science, math, and coding tasks.

Microsoft Azure AI

gpt-4.1 (uksouth)

Vision
Caching
Context Window
1.0M tokens
Max Output
33K tokens
Input
$2.00/M tokens
Output
$8.00/M tokens

GPT-4.1 is a flagship large language model optimized for advanced instruction following, real-world software engineering, and long-context reasoning. It supports a 1 million token context window and outperforms GPT-4o and GPT-4.5 across coding (54.6% SWE-bench Verified), instruction compliance (87.4% IFEval), and multimodal understanding benchmarks. It is tuned for precise code diffs, agent reliability, and high recall in large document contexts, making it ideal for agents, IDE tooling, and enterprise knowledge retrieval.

Microsoft Azure AI

gpt-4.1-nano (westus3)

Vision
Caching
Context Window
1.0M tokens
Max Output
33K tokens
Input
$0.1/M tokens
Output
$0.4/M tokens

For tasks that demand low latency, GPT‑4.1 nano is the fastest and cheapest model in the GPT-4.1 series. It delivers exceptional performance at a small size with its 1 million token context window, and scores 80.1% on MMLU, 50.3% on GPQA, and 9.8% on Aider polyglot coding – even higher than GPT‑4o mini. It’s ideal for tasks like classification or autocompletion.

Vision
Caching
Context Window
1.0M tokens
Max Output
33K tokens
Input
$0.1/M tokens
Output
$0.4/M tokens

For tasks that demand low latency, GPT‑4.1 nano is the fastest and cheapest model in the GPT-4.1 series. It delivers exceptional performance at a small size with its 1 million token context window, and scores 80.1% on MMLU, 50.3% on GPQA, and 9.8% on Aider polyglot coding – even higher than GPT‑4o mini. It’s ideal for tasks like classification or autocompletion.

Microsoft Azure AI

gpt-4.1-mini (westus3)

Vision
Caching
Context Window
1.0M tokens
Max Output
33K tokens
Input
$0.4/M tokens
Output
$1.60/M tokens

GPT-4.1 Mini is a mid-sized model delivering performance competitive with GPT-4o at substantially lower latency and cost. It retains a 1 million token context window and scores 45.1% on hard instruction evals, 35.8% on MultiChallenge, and 84.1% on IFEval. Mini also shows strong coding ability (e.g., 31.6% on Aider’s polyglot diff benchmark) and vision understanding, making it suitable for interactive applications with tight performance constraints.

Microsoft Azure AI

gpt-4.1 (francecentral)

Vision
Caching
Context Window
1.0M tokens
Max Output
33K tokens
Input
$2.00/M tokens
Output
$8.00/M tokens

GPT-4.1 is a flagship large language model optimized for advanced instruction following, real-world software engineering, and long-context reasoning. It supports a 1 million token context window and outperforms GPT-4o and GPT-4.5 across coding (54.6% SWE-bench Verified), instruction compliance (87.4% IFEval), and multimodal understanding benchmarks. It is tuned for precise code diffs, agent reliability, and high recall in large document contexts, making it ideal for agents, IDE tooling, and enterprise knowledge retrieval.

Vision
Caching
Context Window
1.0M tokens
Max Output
33K tokens
Input
$0.1/M tokens
Output
$0.4/M tokens

For tasks that demand low latency, GPT‑4.1 nano is the fastest and cheapest model in the GPT-4.1 series. It delivers exceptional performance at a small size with its 1 million token context window, and scores 80.1% on MMLU, 50.3% on GPQA, and 9.8% on Aider polyglot coding – even higher than GPT‑4o mini. It’s ideal for tasks like classification or autocompletion.

Microsoft Azure AI

gpt-4.1-nano (uksouth)

Vision
Caching
Context Window
1.0M tokens
Max Output
33K tokens
Input
$0.1/M tokens
Output
$0.4/M tokens

For tasks that demand low latency, GPT‑4.1 nano is the fastest and cheapest model in the GPT-4.1 series. It delivers exceptional performance at a small size with its 1 million token context window, and scores 80.1% on MMLU, 50.3% on GPQA, and 9.8% on Aider polyglot coding – even higher than GPT‑4o mini. It’s ideal for tasks like classification or autocompletion.

Vision
Caching
Context Window
1.0M tokens
Max Output
33K tokens
Input
$0.4/M tokens
Output
$1.60/M tokens

GPT-4.1 Mini is a mid-sized model delivering performance competitive with GPT-4o at substantially lower latency and cost. It retains a 1 million token context window and scores 45.1% on hard instruction evals, 35.8% on MultiChallenge, and 84.1% on IFEval. Mini also shows strong coding ability (e.g., 31.6% on Aider’s polyglot diff benchmark) and vision understanding, making it suitable for interactive applications with tight performance constraints.

Microsoft Azure AI

gpt-4.1 (eastus2)

Vision
Caching
Context Window
1.0M tokens
Max Output
33K tokens
Input
$2.00/M tokens
Output
$8.00/M tokens

GPT-4.1 is a flagship large language model optimized for advanced instruction following, real-world software engineering, and long-context reasoning. It supports a 1 million token context window and outperforms GPT-4o and GPT-4.5 across coding (54.6% SWE-bench Verified), instruction compliance (87.4% IFEval), and multimodal understanding benchmarks. It is tuned for precise code diffs, agent reliability, and high recall in large document contexts, making it ideal for agents, IDE tooling, and enterprise knowledge retrieval.

Microsoft Azure AI

gpt-5 (eastus2)

Caching
Reasoning
Context Window
200K tokens
Max Output
100K tokens
Input
$1.25/M tokens
Output
$10.00/M tokens
Microsoft Azure AI

gpt-4.1-mini

Vision
Caching
Context Window
1.0M tokens
Max Output
33K tokens
Input
$0.4/M tokens
Output
$1.60/M tokens

GPT-4.1 Mini is a mid-sized model delivering performance competitive with GPT-4o at substantially lower latency and cost. It retains a 1 million token context window and scores 45.1% on hard instruction evals, 35.8% on MultiChallenge, and 84.1% on IFEval. Mini also shows strong coding ability (e.g., 31.6% on Aider’s polyglot diff benchmark) and vision understanding, making it suitable for interactive applications with tight performance constraints.

Microsoft Azure AI

o4-mini (eastus2)

Caching
Reasoning
Context Window
200K tokens
Max Output
100K tokens
Input
$1.10/M tokens
Output
$4.40/M tokens

o3-mini is OpenAI's most recent small reasoning model, providing high intelligence at the same cost and latency targets of o1-mini. o3-mini also supports key developer features, like Structured Outputs, function calling, Batch API, and more. Like other models in the o-series, it is designed to excel at science, math, and coding tasks.

Vision
Caching
Context Window
1.0M tokens
Max Output
33K tokens
Input
$0.4/M tokens
Output
$1.60/M tokens

GPT-4.1 Mini is a mid-sized model delivering performance competitive with GPT-4o at substantially lower latency and cost. It retains a 1 million token context window and scores 45.1% on hard instruction evals, 35.8% on MultiChallenge, and 84.1% on IFEval. Mini also shows strong coding ability (e.g., 31.6% on Aider’s polyglot diff benchmark) and vision understanding, making it suitable for interactive applications with tight performance constraints.

Microsoft Azure AI

gpt-4.1

Vision
Caching
Context Window
1.0M tokens
Max Output
33K tokens
Input
$2.00/M tokens
Output
$8.00/M tokens

GPT-4.1 is a flagship large language model optimized for advanced instruction following, real-world software engineering, and long-context reasoning. It supports a 1 million token context window and outperforms GPT-4o and GPT-4.5 across coding (54.6% SWE-bench Verified), instruction compliance (87.4% IFEval), and multimodal understanding benchmarks. It is tuned for precise code diffs, agent reliability, and high recall in large document contexts, making it ideal for agents, IDE tooling, and enterprise knowledge retrieval.

Microsoft Azure AI

gpt-4.1-nano (eastus2)

Vision
Caching
Context Window
1.0M tokens
Max Output
33K tokens
Input
$0.1/M tokens
Output
$0.4/M tokens

For tasks that demand low latency, GPT‑4.1 nano is the fastest and cheapest model in the GPT-4.1 series. It delivers exceptional performance at a small size with its 1 million token context window, and scores 80.1% on MMLU, 50.3% on GPQA, and 9.8% on Aider polyglot coding – even higher than GPT‑4o mini. It’s ideal for tasks like classification or autocompletion.

Microsoft Azure AI

gpt-5

Caching
Reasoning
Context Window
200K tokens
Max Output
100K tokens
Input
$1.25/M tokens
Output
$10.00/M tokens
Microsoft Azure AI

gpt-5 (uksouth)

Caching
Reasoning
Context Window
200K tokens
Max Output
100K tokens
Input
$1.25/M tokens
Output
$10.00/M tokens
Microsoft Azure AI

gpt-4.1-mini (eastus2)

Vision
Caching
Context Window
1.0M tokens
Max Output
33K tokens
Input
$0.4/M tokens
Output
$1.60/M tokens

GPT-4.1 Mini is a mid-sized model delivering performance competitive with GPT-4o at substantially lower latency and cost. It retains a 1 million token context window and scores 45.1% on hard instruction evals, 35.8% on MultiChallenge, and 84.1% on IFEval. Mini also shows strong coding ability (e.g., 31.6% on Aider’s polyglot diff benchmark) and vision understanding, making it suitable for interactive applications with tight performance constraints.

Microsoft Azure AI

gpt-4.1-mini (uksouth)

Vision
Caching
Context Window
1.0M tokens
Max Output
33K tokens
Input
$0.4/M tokens
Output
$1.60/M tokens

GPT-4.1 Mini is a mid-sized model delivering performance competitive with GPT-4o at substantially lower latency and cost. It retains a 1 million token context window and scores 45.1% on hard instruction evals, 35.8% on MultiChallenge, and 84.1% on IFEval. Mini also shows strong coding ability (e.g., 31.6% on Aider’s polyglot diff benchmark) and vision understanding, making it suitable for interactive applications with tight performance constraints.

Microsoft Azure AI

gpt-4.1 (westus3)

Vision
Caching
Context Window
1.0M tokens
Max Output
33K tokens
Input
$2.00/M tokens
Output
$8.00/M tokens

GPT-4.1 is a flagship large language model optimized for advanced instruction following, real-world software engineering, and long-context reasoning. It supports a 1 million token context window and outperforms GPT-4o and GPT-4.5 across coding (54.6% SWE-bench Verified), instruction compliance (87.4% IFEval), and multimodal understanding benchmarks. It is tuned for precise code diffs, agent reliability, and high recall in large document contexts, making it ideal for agents, IDE tooling, and enterprise knowledge retrieval.

Microsoft Azure AI

o4-mini (swedencentral)

Caching
Reasoning
Context Window
200K tokens
Max Output
100K tokens
Input
$1.10/M tokens
Output
$4.40/M tokens

o3-mini is OpenAI's most recent small reasoning model, providing high intelligence at the same cost and latency targets of o1-mini. o3-mini also supports key developer features, like Structured Outputs, function calling, Batch API, and more. Like other models in the o-series, it is designed to excel at science, math, and coding tasks.

Microsoft Azure AI

o4-mini (uksouth)

Caching
Reasoning
Context Window
200K tokens
Max Output
100K tokens
Input
$1.10/M tokens
Output
$4.40/M tokens

o3-mini is OpenAI's most recent small reasoning model, providing high intelligence at the same cost and latency targets of o1-mini. o3-mini also supports key developer features, like Structured Outputs, function calling, Batch API, and more. Like other models in the o-series, it is designed to excel at science, math, and coding tasks.

Microsoft Azure AI

gpt-4.1 (swedencentral)

Vision
Caching
Context Window
1.0M tokens
Max Output
33K tokens
Input
$2.00/M tokens
Output
$8.00/M tokens

GPT-4.1 is a flagship large language model optimized for advanced instruction following, real-world software engineering, and long-context reasoning. It supports a 1 million token context window and outperforms GPT-4o and GPT-4.5 across coding (54.6% SWE-bench Verified), instruction compliance (87.4% IFEval), and multimodal understanding benchmarks. It is tuned for precise code diffs, agent reliability, and high recall in large document contexts, making it ideal for agents, IDE tooling, and enterprise knowledge retrieval.

Microsoft Azure AI

gpt-4.1-nano

Vision
Caching
Context Window
1.0M tokens
Max Output
33K tokens
Input
$0.1/M tokens
Output
$0.4/M tokens

For tasks that demand low latency, GPT‑4.1 nano is the fastest and cheapest model in the GPT-4.1 series. It delivers exceptional performance at a small size with its 1 million token context window, and scores 80.1% on MMLU, 50.3% on GPQA, and 9.8% on Aider polyglot coding – even higher than GPT‑4o mini. It’s ideal for tasks like classification or autocompletion.

Microsoft Azure AI

gpt-5 (swedencentral)

Caching
Reasoning
Context Window
200K tokens
Max Output
100K tokens
Input
$1.25/M tokens
Output
$10.00/M tokens
Microsoft Azure AI

o4-mini

Caching
Reasoning
Context Window
200K tokens
Max Output
100K tokens
Input
$1.10/M tokens
Output
$4.40/M tokens

o3-mini is OpenAI's most recent small reasoning model, providing high intelligence at the same cost and latency targets of o1-mini. o3-mini also supports key developer features, like Structured Outputs, function calling, Batch API, and more. Like other models in the o-series, it is designed to excel at science, math, and coding tasks.

Ready to use Microsoft Azure AI models?

Access all Microsoft Azure AI models through Requesty's unified API with intelligent routing, caching, and cost optimization.

Microsoft Azure AI AI Models - Pricing & Features | Requesty