Together AI Inc.

Platform for running and fine-tuning open source models.

📍 🇺🇸 US•15 models available•Visit Website →
15
Available Models
$0.8
Avg Input Price/M
$0.06
Cheapest Model
together/meta-llama/Llama-3.2-3B-Instruct-Turbo
$3.00
Most Expensive
together/deepseek-ai/DeepSeek-R1

Features Overview

0
Vision Support
0
Advanced Reasoning
0
Caching Support
0
Computer Use

Privacy & Data Policy

Data Retention

No data retention

Location

🇺🇸 US

All Together AI Inc. Models

View All Providers →
Context Window
8K tokens
Max Output
Unlimited
Input
$0.88/M tokens
Output
$0.88/M tokens

A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.

Context Window
33K tokens
Max Output
Unlimited
Input
$0.8/M tokens
Output
$0.8/M tokens

Qwen3, the latest generation in the Qwen large language model series, features both dense and mixture-of-experts (MoE) architectures to excel in reasoning, multilingual support, and advanced agent tasks. Its unique ability to switch seamlessly between a thinking mode for complex reasoning and a non-thinking mode for efficient dialogue ensures versatile, high-quality performance. Significantly outperforming prior models like QwQ and Qwen2.5, Qwen3 delivers superior mathematics, coding, commonsense reasoning, creative writing, and interactive dialogue capabilities. The Qwen3-30B-A3B variant includes 30.5 billion parameters (3.3 billion activated), 48 layers, 128 experts (8 activated per task), and supports up to 131K token contexts with YaRN, setting a new standard among open-source models.

Together AI Inc.

deepseek-ai/DeepSeek-V3

Context Window
131K tokens
Max Output
Unlimited
Input
$1.25/M tokens
Output
$1.25/M tokens

DeepSeek-R1-Distill-Qwen-7B is a 7 billion parameter dense language model distilled from DeepSeek-R1, leveraging reinforcement learning-enhanced reasoning data generated by DeepSeek's larger models. The distillation process transfers advanced reasoning, math, and code capabilities into a smaller, more efficient model architecture based on Qwen2.5-Math-7B. This model demonstrates strong performance across mathematical benchmarks (92.8% pass@1 on MATH-500), coding tasks (Codeforces rating 1189), and general reasoning (49.1% pass@1 on GPQA Diamond), achieving competitive accuracy relative to larger models while maintaining smaller inference costs.

Context Window
33K tokens
Max Output
Unlimited
Input
$0.3/M tokens
Output
$0.3/M tokens

Qwen3, the latest generation in the Qwen large language model series, features both dense and mixture-of-experts (MoE) architectures to excel in reasoning, multilingual support, and advanced agent tasks. Its unique ability to switch seamlessly between a thinking mode for complex reasoning and a non-thinking mode for efficient dialogue ensures versatile, high-quality performance. Significantly outperforming prior models like QwQ and Qwen2.5, Qwen3 delivers superior mathematics, coding, commonsense reasoning, creative writing, and interactive dialogue capabilities. The Qwen3-30B-A3B variant includes 30.5 billion parameters (3.3 billion activated), 48 layers, 128 experts (8 activated per task), and supports up to 131K token contexts with YaRN, setting a new standard among open-source models.

Context Window
8K tokens
Max Output
Unlimited
Input
$0.2/M tokens
Output
$0.2/M tokens

A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.

Context Window
8K tokens
Max Output
Unlimited
Input
$0.2/M tokens
Output
$0.2/M tokens

A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.

Context Window
131K tokens
Max Output
Unlimited
Input
$0.18/M tokens
Output
$0.18/M tokens

A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.

Context Window
8K tokens
Max Output
Unlimited
Input
$0.1/M tokens
Output
$0.1/M tokens

A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.

Context Window
131K tokens
Max Output
Unlimited
Input
$0.88/M tokens
Output
$0.88/M tokens

A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.

Context Window
131K tokens
Max Output
Unlimited
Input
$0.06/M tokens
Output
$0.06/M tokens

A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.

Together AI Inc.

deepseek-ai/DeepSeek-R1

Context Window
64K tokens
Max Output
8K tokens
Input
$3.00/M tokens
Output
$7.00/M tokens

DeepSeek-R1-Distill-Qwen-7B is a 7 billion parameter dense language model distilled from DeepSeek-R1, leveraging reinforcement learning-enhanced reasoning data generated by DeepSeek's larger models. The distillation process transfers advanced reasoning, math, and code capabilities into a smaller, more efficient model architecture based on Qwen2.5-Math-7B. This model demonstrates strong performance across mathematical benchmarks (92.8% pass@1 on MATH-500), coding tasks (Codeforces rating 1189), and general reasoning (49.1% pass@1 on GPQA Diamond), achieving competitive accuracy relative to larger models while maintaining smaller inference costs.

Context Window
33K tokens
Max Output
Unlimited
Input
$1.20/M tokens
Output
$1.20/M tokens

Qwen3, the latest generation in the Qwen large language model series, features both dense and mixture-of-experts (MoE) architectures to excel in reasoning, multilingual support, and advanced agent tasks. Its unique ability to switch seamlessly between a thinking mode for complex reasoning and a non-thinking mode for efficient dialogue ensures versatile, high-quality performance. Significantly outperforming prior models like QwQ and Qwen2.5, Qwen3 delivers superior mathematics, coding, commonsense reasoning, creative writing, and interactive dialogue capabilities. The Qwen3-30B-A3B variant includes 30.5 billion parameters (3.3 billion activated), 48 layers, 128 experts (8 activated per task), and supports up to 131K token contexts with YaRN, setting a new standard among open-source models.

Context Window
131K tokens
Max Output
Unlimited
Input
$0.88/M tokens
Output
$0.88/M tokens

A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.

Together AI Inc.

Qwen/QwQ-32B-Preview

Context Window
33K tokens
Max Output
Unlimited
Input
$1.20/M tokens
Output
$1.20/M tokens

Qwen3, the latest generation in the Qwen large language model series, features both dense and mixture-of-experts (MoE) architectures to excel in reasoning, multilingual support, and advanced agent tasks. Its unique ability to switch seamlessly between a thinking mode for complex reasoning and a non-thinking mode for efficient dialogue ensures versatile, high-quality performance. Significantly outperforming prior models like QwQ and Qwen2.5, Qwen3 delivers superior mathematics, coding, commonsense reasoning, creative writing, and interactive dialogue capabilities. The Qwen3-30B-A3B variant includes 30.5 billion parameters (3.3 billion activated), 48 layers, 128 experts (8 activated per task), and supports up to 131K token contexts with YaRN, setting a new standard among open-source models.

Context Window
131K tokens
Max Output
Unlimited
Input
$0.88/M tokens
Output
$0.88/M tokens

A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.

Ready to use Together AI Inc. models?

Access all Together AI Inc. models through Requesty's unified API with intelligent routing, caching, and cost optimization.

Together AI Inc. AI Models - Pricing & Features | Requesty