Together AI Inc.
Platform for running and fine-tuning open source models.
Features Overview
Privacy & Data Policy
All Together AI Inc. Models
View All Providers →meta-llama/Llama-3-70b-chat-hf
A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.
deepseek-ai/DeepSeek-V3
DeepSeek-R1-Distill-Qwen-7B is a 7 billion parameter dense language model distilled from DeepSeek-R1, leveraging reinforcement learning-enhanced reasoning data generated by DeepSeek's larger models. The distillation process transfers advanced reasoning, math, and code capabilities into a smaller, more efficient model architecture based on Qwen2.5-Math-7B. This model demonstrates strong performance across mathematical benchmarks (92.8% pass@1 on MATH-500), coding tasks (Codeforces rating 1189), and general reasoning (49.1% pass@1 on GPQA Diamond), achieving competitive accuracy relative to larger models while maintaining smaller inference costs.
Qwen/Qwen2.5-7B-Instruct-Turbo
Qwen3, the latest generation in the Qwen large language model series, features both dense and mixture-of-experts (MoE) architectures to excel in reasoning, multilingual support, and advanced agent tasks. Its unique ability to switch seamlessly between a thinking mode for complex reasoning and a non-thinking mode for efficient dialogue ensures versatile, high-quality performance. Significantly outperforming prior models like QwQ and Qwen2.5, Qwen3 delivers superior mathematics, coding, commonsense reasoning, creative writing, and interactive dialogue capabilities. The Qwen3-30B-A3B variant includes 30.5 billion parameters (3.3 billion activated), 48 layers, 128 experts (8 activated per task), and supports up to 131K token contexts with YaRN, setting a new standard among open-source models.
meta-llama/Meta-Llama-3-70B-Instruct-Lite
A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.
Qwen/Qwen2.5-72B-Instruct-Turbo
Qwen3, the latest generation in the Qwen large language model series, features both dense and mixture-of-experts (MoE) architectures to excel in reasoning, multilingual support, and advanced agent tasks. Its unique ability to switch seamlessly between a thinking mode for complex reasoning and a non-thinking mode for efficient dialogue ensures versatile, high-quality performance. Significantly outperforming prior models like QwQ and Qwen2.5, Qwen3 delivers superior mathematics, coding, commonsense reasoning, creative writing, and interactive dialogue capabilities. The Qwen3-30B-A3B variant includes 30.5 billion parameters (3.3 billion activated), 48 layers, 128 experts (8 activated per task), and supports up to 131K token contexts with YaRN, setting a new standard among open-source models.
meta-llama/Meta-Llama-3-8B-Instruct-Lite
A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.
meta-llama/Llama-2-13b-chat-hf
A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.
deepseek-ai/DeepSeek-R1
DeepSeek-R1-Distill-Qwen-7B is a 7 billion parameter dense language model distilled from DeepSeek-R1, leveraging reinforcement learning-enhanced reasoning data generated by DeepSeek's larger models. The distillation process transfers advanced reasoning, math, and code capabilities into a smaller, more efficient model architecture based on Qwen2.5-Math-7B. This model demonstrates strong performance across mathematical benchmarks (92.8% pass@1 on MATH-500), coding tasks (Codeforces rating 1189), and general reasoning (49.1% pass@1 on GPQA Diamond), achieving competitive accuracy relative to larger models while maintaining smaller inference costs.
Qwen/QwQ-32B-Preview
Qwen3, the latest generation in the Qwen large language model series, features both dense and mixture-of-experts (MoE) architectures to excel in reasoning, multilingual support, and advanced agent tasks. Its unique ability to switch seamlessly between a thinking mode for complex reasoning and a non-thinking mode for efficient dialogue ensures versatile, high-quality performance. Significantly outperforming prior models like QwQ and Qwen2.5, Qwen3 delivers superior mathematics, coding, commonsense reasoning, creative writing, and interactive dialogue capabilities. The Qwen3-30B-A3B variant includes 30.5 billion parameters (3.3 billion activated), 48 layers, 128 experts (8 activated per task), and supports up to 131K token contexts with YaRN, setting a new standard among open-source models.
nvidia/Llama-3.1-Nemotron-70B-Instruct-HF
Llama-3.3-Nemotron-Super-49B-v1 is a large language model (LLM) optimized for advanced reasoning, conversational interactions, retrieval-augmented generation (RAG), and tool-calling tasks. Derived from Meta's Llama-3.3-70B-Instruct, it employs a Neural Architecture Search (NAS) approach, significantly enhancing efficiency and reducing memory requirements. This allows the model to support a context length of up to 128K tokens and fit efficiently on single high-performance GPUs, such as NVIDIA H200. Note: you must include `detailed thinking on` in the system prompt to enable reasoning. Please see [Usage Recommendations](https://huggingface.co/nvidia/Llama-3_1-Nemotron-Ultra-253B-v1#quick-start-and-usage-recommendations) for more.
meta-llama/Meta-Llama-Guard-3-8B
A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.
meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo
A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.
meta-llama/Llama-2-7b-chat-hf
A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.
NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO
DeepHermes 3 (Mistral 24B Preview) is an instruction-tuned language model by Nous Research based on Mistral-Small-24B, designed for chat, function calling, and advanced multi-turn reasoning. It introduces a dual-mode system that toggles between intuitive chat responses and structured “deep reasoning” mode using special system prompts. Fine-tuned via distillation from R1, it supports structured output (JSON mode) and function call syntax for agent-based applications. DeepHermes 3 supports a **reasoning toggle via system prompt**, allowing users to switch between fast, intuitive responses and deliberate, multi-step reasoning. When activated with the following specific system instruction, the model enters a *"deep thinking"* mode—generating extended chains of thought wrapped in `<think></think>` tags before delivering a final answer. System Prompt: You are a deep thinking AI, you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem.
meta-llama/Llama-3-8b-chat-hf
A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.
meta-llama/Llama-3.3-70B-Instruct-Turbo
A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.
meta-llama/Meta-Llama-3-70B-Instruct-Turbo
A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.
meta-llama/LlamaGuard-2-8b
A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.
meta-llama/Llama-2-70b-hf
A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.
upstage/SOLAR-10.7B-Instruct-v1.0
Qwen/Qwen2.5-Coder-32B-Instruct
Qwen3, the latest generation in the Qwen large language model series, features both dense and mixture-of-experts (MoE) architectures to excel in reasoning, multilingual support, and advanced agent tasks. Its unique ability to switch seamlessly between a thinking mode for complex reasoning and a non-thinking mode for efficient dialogue ensures versatile, high-quality performance. Significantly outperforming prior models like QwQ and Qwen2.5, Qwen3 delivers superior mathematics, coding, commonsense reasoning, creative writing, and interactive dialogue capabilities. The Qwen3-30B-A3B variant includes 30.5 billion parameters (3.3 billion activated), 48 layers, 128 experts (8 activated per task), and supports up to 131K token contexts with YaRN, setting a new standard among open-source models.
meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo
A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.
Qwen/Qwen2-72B-Instruct
Qwen3, the latest generation in the Qwen large language model series, features both dense and mixture-of-experts (MoE) architectures to excel in reasoning, multilingual support, and advanced agent tasks. Its unique ability to switch seamlessly between a thinking mode for complex reasoning and a non-thinking mode for efficient dialogue ensures versatile, high-quality performance. Significantly outperforming prior models like QwQ and Qwen2.5, Qwen3 delivers superior mathematics, coding, commonsense reasoning, creative writing, and interactive dialogue capabilities. The Qwen3-30B-A3B variant includes 30.5 billion parameters (3.3 billion activated), 48 layers, 128 experts (8 activated per task), and supports up to 131K token contexts with YaRN, setting a new standard among open-source models.
deepseek-llm-67b-chat
DeepSeek-R1-Distill-Qwen-7B is a 7 billion parameter dense language model distilled from DeepSeek-R1, leveraging reinforcement learning-enhanced reasoning data generated by DeepSeek's larger models. The distillation process transfers advanced reasoning, math, and code capabilities into a smaller, more efficient model architecture based on Qwen2.5-Math-7B. This model demonstrates strong performance across mathematical benchmarks (92.8% pass@1 on MATH-500), coding tasks (Codeforces rating 1189), and general reasoning (49.1% pass@1 on GPQA Diamond), achieving competitive accuracy relative to larger models while maintaining smaller inference costs.
meta-llama/Llama-3.2-3B-Instruct-Turbo
A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.
Ready to use Together AI Inc. models?
Access all Together AI Inc. models through Requesty's unified API with intelligent routing, caching, and cost optimization.