Together AI Inc.

nvidia/Llama-3.1-Nemotron-70B-Instruct-HF

Llama-3.3-Nemotron-Super-49B-v1 is a large language model (LLM) optimized for advanced reasoning, conversational interactions, retrieval-augmented generation (RAG), and tool-calling tasks. Derived from Meta's Llama-3.3-70B-Instruct, it employs a Neural Architecture Search (NAS) approach, significantly enhancing efficiency and reducing memory requirements. This allows the model to support a context length of up to 128K tokens and fit efficiently on single high-performance GPUs, such as NVIDIA H200. Note: you must include `detailed thinking on` in the system prompt to enable reasoning. Please see [Usage Recommendations](https://huggingface.co/nvidia/Llama-3_1-Nemotron-Ultra-253B-v1#quick-start-and-usage-recommendations) for more.

Pricing

$0.88
Input tokens per million
$0.88
Output tokens per million

Technical Specifications

Context Window
33K tokens
Max Output Tokens
Unlimited
Global Availability
Last Updated
N/A

Provider

Together AI Inc.
Location
πŸ‡ΊπŸ‡Έ US
Visit Website β†’

Privacy & Data

Data Retention
No
Used for Training
No
Together AI Privacy Policy β†’
nvidia/Llama-3.1-Nemotron-70B-Instruct-HF - AI Model Details | Requesty | Requesty