Together AI Inc.

nvidia/Llama-3.1-Nemotron-70B-Instruct-HF

Llama-3.3-Nemotron-Super-49B-v1 is a large language model (LLM) optimized for advanced reasoning, conversational interactions, retrieval-augmented generation (RAG), and tool-calling tasks. Derived from Meta's Llama-3.3-70B-Instruct, it employs a Neural Architecture Search (NAS) approach, significantly enhancing efficiency and reducing memory requirements. This allows the model to support a context length of up to 128K tokens and fit efficiently on single high-performance GPUs, such as NVIDIA H200. Note: you must include `detailed thinking on` in the system prompt to enable reasoning. Please see [Usage Recommendations](https://huggingface.co/nvidia/Llama-3_1-Nemotron-Ultra-253B-v1#quick-start-and-usage-recommendations) for more.

Pricing

$0.88

Input tokens per million

$0.88

Output tokens per million

Technical Specifications

Context Window

33K tokens

Max Output Tokens

Unlimited

Global Availability

Last Updated

N/A

Provider

Together AI Inc.

Location

🇺🇸 US

Visit Website →

Privacy & Data

Data Retention

Used for Training

Together AI Privacy Policy →

Get Started

Try with Requesty Browse All Together AI Inc. Models →