Together AI Inc. AI Models - Pricing & Features | Requesty

Together AI Inc.

meta-llama/Llama-3-70b-chat-hf

Context Window

131K tokens

Max Output

Unlimited

Input

$0.88/M tokens

Output

$0.88/M tokens

A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.

View Details →

Together AI Inc.

deepseek-ai/DeepSeek-V3

Context Window

131K tokens

Max Output

Unlimited

Input

$1.25/M tokens

Output

$1.25/M tokens

DeepSeek-R1-Distill-Qwen-7B is a 7 billion parameter dense language model distilled from DeepSeek-R1, leveraging reinforcement learning-enhanced reasoning data generated by DeepSeek's larger models. The distillation process transfers advanced reasoning, math, and code capabilities into a smaller, more efficient model architecture based on Qwen2.5-Math-7B. This model demonstrates strong performance across mathematical benchmarks (92.8% pass@1 on MATH-500), coding tasks (Codeforces rating 1189), and general reasoning (49.1% pass@1 on GPQA Diamond), achieving competitive accuracy relative to larger models while maintaining smaller inference costs.

View Details →

Together AI Inc.

Qwen/Qwen2.5-7B-Instruct-Turbo

Context Window

33K tokens

Max Output

Unlimited

Input

$0.3/M tokens

Output

$0.3/M tokens

Qwen3, the latest generation in the Qwen large language model series, features both dense and mixture-of-experts (MoE) architectures to excel in reasoning, multilingual support, and advanced agent tasks. Its unique ability to switch seamlessly between a thinking mode for complex reasoning and a non-thinking mode for efficient dialogue ensures versatile, high-quality performance. Significantly outperforming prior models like QwQ and Qwen2.5, Qwen3 delivers superior mathematics, coding, commonsense reasoning, creative writing, and interactive dialogue capabilities. The Qwen3-30B-A3B variant includes 30.5 billion parameters (3.3 billion activated), 48 layers, 128 experts (8 activated per task), and supports up to 131K token contexts with YaRN, setting a new standard among open-source models.

View Details →

Together AI Inc.

meta-llama/Meta-Llama-3-70B-Instruct-Lite

Context Window

8K tokens

Max Output

Unlimited

Input

$0.54/M tokens

Output

$0.54/M tokens

A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.

View Details →

Together AI Inc.

Qwen/Qwen2.5-72B-Instruct-Turbo

Context Window

33K tokens

Max Output

Unlimited

Input

$1.20/M tokens

Output

$1.20/M tokens

Qwen3, the latest generation in the Qwen large language model series, features both dense and mixture-of-experts (MoE) architectures to excel in reasoning, multilingual support, and advanced agent tasks. Its unique ability to switch seamlessly between a thinking mode for complex reasoning and a non-thinking mode for efficient dialogue ensures versatile, high-quality performance. Significantly outperforming prior models like QwQ and Qwen2.5, Qwen3 delivers superior mathematics, coding, commonsense reasoning, creative writing, and interactive dialogue capabilities. The Qwen3-30B-A3B variant includes 30.5 billion parameters (3.3 billion activated), 48 layers, 128 experts (8 activated per task), and supports up to 131K token contexts with YaRN, setting a new standard among open-source models.

View Details →

Together AI Inc.

meta-llama/Meta-Llama-3-8B-Instruct-Lite

Context Window

8K tokens

Max Output

Unlimited

Input

$0.1/M tokens

Output

$0.1/M tokens

A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.

View Details →

Together AI Inc.

meta-llama/Llama-2-13b-chat-hf

Context Window

4K tokens

Max Output

Unlimited

Input

$0.22/M tokens

Output

$0.22/M tokens

A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.

View Details →

Together AI Inc.

deepseek-ai/DeepSeek-R1

Context Window

64K tokens

Max Output

8K tokens

Input

$3.00/M tokens

Output

$7.00/M tokens

DeepSeek-R1-Distill-Qwen-7B is a 7 billion parameter dense language model distilled from DeepSeek-R1, leveraging reinforcement learning-enhanced reasoning data generated by DeepSeek's larger models. The distillation process transfers advanced reasoning, math, and code capabilities into a smaller, more efficient model architecture based on Qwen2.5-Math-7B. This model demonstrates strong performance across mathematical benchmarks (92.8% pass@1 on MATH-500), coding tasks (Codeforces rating 1189), and general reasoning (49.1% pass@1 on GPQA Diamond), achieving competitive accuracy relative to larger models while maintaining smaller inference costs.

View Details →

Together AI Inc.

Qwen/QwQ-32B-Preview

Context Window

33K tokens

Max Output

Unlimited

Input

$1.20/M tokens

Output

$1.20/M tokens

Qwen3, the latest generation in the Qwen large language model series, features both dense and mixture-of-experts (MoE) architectures to excel in reasoning, multilingual support, and advanced agent tasks. Its unique ability to switch seamlessly between a thinking mode for complex reasoning and a non-thinking mode for efficient dialogue ensures versatile, high-quality performance. Significantly outperforming prior models like QwQ and Qwen2.5, Qwen3 delivers superior mathematics, coding, commonsense reasoning, creative writing, and interactive dialogue capabilities. The Qwen3-30B-A3B variant includes 30.5 billion parameters (3.3 billion activated), 48 layers, 128 experts (8 activated per task), and supports up to 131K token contexts with YaRN, setting a new standard among open-source models.

View Details →

Together AI Inc.

nvidia/Llama-3.1-Nemotron-70B-Instruct-HF

Context Window

33K tokens

Max Output

Unlimited

Input

$0.88/M tokens

Output

$0.88/M tokens

Llama-3.3-Nemotron-Super-49B-v1 is a large language model (LLM) optimized for advanced reasoning, conversational interactions, retrieval-augmented generation (RAG), and tool-calling tasks. Derived from Meta's Llama-3.3-70B-Instruct, it employs a Neural Architecture Search (NAS) approach, significantly enhancing efficiency and reducing memory requirements. This allows the model to support a context length of up to 128K tokens and fit efficiently on single high-performance GPUs, such as NVIDIA H200. Note: you must include `detailed thinking on` in the system prompt to enable reasoning. Please see [Usage Recommendations](https://huggingface.co/nvidia/Llama-3_1-Nemotron-Ultra-253B-v1#quick-start-and-usage-recommendations) for more.

View Details →

Together AI Inc.

meta-llama/Meta-Llama-Guard-3-8B

Context Window

8K tokens

Max Output

Unlimited

Input

$0.2/M tokens

Output

$0.2/M tokens

A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.

View Details →

Together AI Inc.

meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo

Context Window

131K tokens

Max Output

Unlimited

Input

$0.18/M tokens

Output

$0.18/M tokens

A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.

View Details →

Together AI Inc.

meta-llama/Llama-2-7b-chat-hf

Context Window

4K tokens

Max Output

Unlimited

Input

$0.2/M tokens

Output

$0.2/M tokens

A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.

View Details →

Together AI Inc.

NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO

Context Window

33K tokens

Max Output

Unlimited

Input

$0.6/M tokens

Output

$0.6/M tokens

DeepHermes 3 (Mistral 24B Preview) is an instruction-tuned language model by Nous Research based on Mistral-Small-24B, designed for chat, function calling, and advanced multi-turn reasoning. It introduces a dual-mode system that toggles between intuitive chat responses and structured “deep reasoning” mode using special system prompts. Fine-tuned via distillation from R1, it supports structured output (JSON mode) and function call syntax for agent-based applications. DeepHermes 3 supports a **reasoning toggle via system prompt**, allowing users to switch between fast, intuitive responses and deliberate, multi-step reasoning. When activated with the following specific system instruction, the model enters a *"deep thinking"* mode—generating extended chains of thought wrapped in `<think></think>` tags before delivering a final answer. System Prompt: You are a deep thinking AI, you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem.

View Details →

Together AI Inc.

meta-llama/Llama-3-8b-chat-hf

Context Window

8K tokens

Max Output

Unlimited

Input

$0.2/M tokens

Output

$0.2/M tokens

A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.

View Details →

Together AI Inc.

meta-llama/Llama-3.3-70B-Instruct-Turbo

Context Window

131K tokens

Max Output

Unlimited

Input

$0.88/M tokens

Output

$0.88/M tokens

A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.

View Details →

Together AI Inc.

meta-llama/Meta-Llama-3-70B-Instruct-Turbo

Context Window

8K tokens

Max Output

Unlimited

Input

$0.88/M tokens

Output

$0.88/M tokens

A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.

View Details →

Together AI Inc.

meta-llama/LlamaGuard-2-8b

Context Window

8K tokens

Max Output

Unlimited

Input

$0.2/M tokens

Output

$0.2/M tokens

A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.

View Details →

Together AI Inc.

meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo

Context Window

131K tokens

Max Output

Unlimited

Input

$3.50/M tokens

Output

$3.50/M tokens

A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.

View Details →

Together AI Inc.

meta-llama/Llama-2-70b-hf

Context Window

4K tokens

Max Output

Unlimited

Input

$0.9/M tokens

Output

$0.9/M tokens

A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.

View Details →

Together AI Inc.

upstage/SOLAR-10.7B-Instruct-v1.0

Context Window

4K tokens

Max Output

Unlimited

Input

$0.3/M tokens

Output

$0.3/M tokens

View Details →

Together AI Inc.

Qwen/Qwen2.5-Coder-32B-Instruct

Context Window

33K tokens

Max Output

Unlimited

Input

$0.8/M tokens

Output

$0.8/M tokens

Qwen3, the latest generation in the Qwen large language model series, features both dense and mixture-of-experts (MoE) architectures to excel in reasoning, multilingual support, and advanced agent tasks. Its unique ability to switch seamlessly between a thinking mode for complex reasoning and a non-thinking mode for efficient dialogue ensures versatile, high-quality performance. Significantly outperforming prior models like QwQ and Qwen2.5, Qwen3 delivers superior mathematics, coding, commonsense reasoning, creative writing, and interactive dialogue capabilities. The Qwen3-30B-A3B variant includes 30.5 billion parameters (3.3 billion activated), 48 layers, 128 experts (8 activated per task), and supports up to 131K token contexts with YaRN, setting a new standard among open-source models.

View Details →

Together AI Inc.

meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo

Context Window

131K tokens

Max Output

Unlimited

Input

$0.88/M tokens

Output

$0.88/M tokens

A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.

View Details →

Together AI Inc.

Qwen/Qwen2-72B-Instruct

Context Window

33K tokens

Max Output

Unlimited

Input

$0.9/M tokens

Output

$0.9/M tokens

Qwen3, the latest generation in the Qwen large language model series, features both dense and mixture-of-experts (MoE) architectures to excel in reasoning, multilingual support, and advanced agent tasks. Its unique ability to switch seamlessly between a thinking mode for complex reasoning and a non-thinking mode for efficient dialogue ensures versatile, high-quality performance. Significantly outperforming prior models like QwQ and Qwen2.5, Qwen3 delivers superior mathematics, coding, commonsense reasoning, creative writing, and interactive dialogue capabilities. The Qwen3-30B-A3B variant includes 30.5 billion parameters (3.3 billion activated), 48 layers, 128 experts (8 activated per task), and supports up to 131K token contexts with YaRN, setting a new standard among open-source models.

View Details →

Together AI Inc.

deepseek-llm-67b-chat

Context Window

4K tokens

Max Output

Unlimited

Input

$0.9/M tokens

Output

$0.9/M tokens

DeepSeek-R1-Distill-Qwen-7B is a 7 billion parameter dense language model distilled from DeepSeek-R1, leveraging reinforcement learning-enhanced reasoning data generated by DeepSeek's larger models. The distillation process transfers advanced reasoning, math, and code capabilities into a smaller, more efficient model architecture based on Qwen2.5-Math-7B. This model demonstrates strong performance across mathematical benchmarks (92.8% pass@1 on MATH-500), coding tasks (Codeforces rating 1189), and general reasoning (49.1% pass@1 on GPQA Diamond), achieving competitive accuracy relative to larger models while maintaining smaller inference costs.

View Details →

Together AI Inc.

meta-llama/Llama-3.2-3B-Instruct-Turbo

Context Window

131K tokens

Max Output

Unlimited

Input

$0.06/M tokens

Output

$0.06/M tokens

A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.

View Details →

Together AI Inc.

Features Overview

Privacy & Data Policy

Data Retention

Location

Privacy Policy

All Together AI Inc. Models

meta-llama/Llama-3-70b-chat-hf

deepseek-ai/DeepSeek-V3

Qwen/Qwen2.5-7B-Instruct-Turbo

meta-llama/Meta-Llama-3-70B-Instruct-Lite

Qwen/Qwen2.5-72B-Instruct-Turbo

meta-llama/Meta-Llama-3-8B-Instruct-Lite

meta-llama/Llama-2-13b-chat-hf

deepseek-ai/DeepSeek-R1

Qwen/QwQ-32B-Preview

nvidia/Llama-3.1-Nemotron-70B-Instruct-HF

meta-llama/Meta-Llama-Guard-3-8B

meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo

meta-llama/Llama-2-7b-chat-hf

NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO

meta-llama/Llama-3-8b-chat-hf

meta-llama/Llama-3.3-70B-Instruct-Turbo

meta-llama/Meta-Llama-3-70B-Instruct-Turbo

meta-llama/LlamaGuard-2-8b

meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo

meta-llama/Llama-2-70b-hf

upstage/SOLAR-10.7B-Instruct-v1.0

Qwen/Qwen2.5-Coder-32B-Instruct

meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo

Qwen/Qwen2-72B-Instruct

deepseek-llm-67b-chat

meta-llama/Llama-3.2-3B-Instruct-Turbo

Ready to use Together AI Inc. models?