Together AI Inc.
Platform for running and fine-tuning open source models.
Features Overview
Privacy & Data Policy
All Together AI Inc. Models
View All Providers âmeta-llama/Meta-Llama-3-70B-Instruct-Turbo
A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.
Qwen/Qwen2.5-Coder-32B-Instruct
Qwen3, the latest generation in the Qwen large language model series, features both dense and mixture-of-experts (MoE) architectures to excel in reasoning, multilingual support, and advanced agent tasks. Its unique ability to switch seamlessly between a thinking mode for complex reasoning and a non-thinking mode for efficient dialogue ensures versatile, high-quality performance. Significantly outperforming prior models like QwQ and Qwen2.5, Qwen3 delivers superior mathematics, coding, commonsense reasoning, creative writing, and interactive dialogue capabilities. The Qwen3-30B-A3B variant includes 30.5 billion parameters (3.3 billion activated), 48 layers, 128 experts (8 activated per task), and supports up to 131K token contexts with YaRN, setting a new standard among open-source models.
deepseek-ai/DeepSeek-V3
DeepSeek-R1-Distill-Qwen-7B is a 7 billion parameter dense language model distilled from DeepSeek-R1, leveraging reinforcement learning-enhanced reasoning data generated by DeepSeek's larger models. The distillation process transfers advanced reasoning, math, and code capabilities into a smaller, more efficient model architecture based on Qwen2.5-Math-7B. This model demonstrates strong performance across mathematical benchmarks (92.8% pass@1 on MATH-500), coding tasks (Codeforces rating 1189), and general reasoning (49.1% pass@1 on GPQA Diamond), achieving competitive accuracy relative to larger models while maintaining smaller inference costs.
Qwen/Qwen2.5-7B-Instruct-Turbo
Qwen3, the latest generation in the Qwen large language model series, features both dense and mixture-of-experts (MoE) architectures to excel in reasoning, multilingual support, and advanced agent tasks. Its unique ability to switch seamlessly between a thinking mode for complex reasoning and a non-thinking mode for efficient dialogue ensures versatile, high-quality performance. Significantly outperforming prior models like QwQ and Qwen2.5, Qwen3 delivers superior mathematics, coding, commonsense reasoning, creative writing, and interactive dialogue capabilities. The Qwen3-30B-A3B variant includes 30.5 billion parameters (3.3 billion activated), 48 layers, 128 experts (8 activated per task), and supports up to 131K token contexts with YaRN, setting a new standard among open-source models.
meta-llama/Meta-Llama-Guard-3-8B
A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.
meta-llama/LlamaGuard-2-8b
A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.
meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo
A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.
meta-llama/Meta-Llama-3-8B-Instruct-Lite
A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.
meta-llama/Llama-3.3-70B-Instruct-Turbo
A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.
meta-llama/Llama-3.2-3B-Instruct-Turbo
A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.
deepseek-ai/DeepSeek-R1
DeepSeek-R1-Distill-Qwen-7B is a 7 billion parameter dense language model distilled from DeepSeek-R1, leveraging reinforcement learning-enhanced reasoning data generated by DeepSeek's larger models. The distillation process transfers advanced reasoning, math, and code capabilities into a smaller, more efficient model architecture based on Qwen2.5-Math-7B. This model demonstrates strong performance across mathematical benchmarks (92.8% pass@1 on MATH-500), coding tasks (Codeforces rating 1189), and general reasoning (49.1% pass@1 on GPQA Diamond), achieving competitive accuracy relative to larger models while maintaining smaller inference costs.
Qwen/Qwen2.5-72B-Instruct-Turbo
Qwen3, the latest generation in the Qwen large language model series, features both dense and mixture-of-experts (MoE) architectures to excel in reasoning, multilingual support, and advanced agent tasks. Its unique ability to switch seamlessly between a thinking mode for complex reasoning and a non-thinking mode for efficient dialogue ensures versatile, high-quality performance. Significantly outperforming prior models like QwQ and Qwen2.5, Qwen3 delivers superior mathematics, coding, commonsense reasoning, creative writing, and interactive dialogue capabilities. The Qwen3-30B-A3B variant includes 30.5 billion parameters (3.3 billion activated), 48 layers, 128 experts (8 activated per task), and supports up to 131K token contexts with YaRN, setting a new standard among open-source models.
meta-llama/Llama-3-70b-chat-hf
A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.
Qwen/QwQ-32B-Preview
Qwen3, the latest generation in the Qwen large language model series, features both dense and mixture-of-experts (MoE) architectures to excel in reasoning, multilingual support, and advanced agent tasks. Its unique ability to switch seamlessly between a thinking mode for complex reasoning and a non-thinking mode for efficient dialogue ensures versatile, high-quality performance. Significantly outperforming prior models like QwQ and Qwen2.5, Qwen3 delivers superior mathematics, coding, commonsense reasoning, creative writing, and interactive dialogue capabilities. The Qwen3-30B-A3B variant includes 30.5 billion parameters (3.3 billion activated), 48 layers, 128 experts (8 activated per task), and supports up to 131K token contexts with YaRN, setting a new standard among open-source models.
meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo
A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.
Ready to use Together AI Inc. models?
Access all Together AI Inc. models through Requesty's unified API with intelligent routing, caching, and cost optimization.