Requesty

NVIDIA

NVIDIA's hosted inference for its Nemotron family of open models. Retains API data and may use it to train and improve models. Requesty routes to 3 NVIDIA models with context windows up to 1.0M tokens. One API key, OpenAI-compatible SDK, no markup.

Intelligence Index
24.3
Coding Index
19.0
GPQA Diamond
75.7%
Terminal-Bench Hard
13.6%

All NVIDIA models

ModelContextMax OutputInput/1MOutput/1MCapabilitiesCoding
nemotron-3-nano-30b-a3b
262KFreeFree
🧠🔧
19
nemotron-3-super-120b-a12b
1.0M66KFreeFree
🧠🔧
31
nemotron-3-ultra-550b-a55b
1.0M66KFreeFree
🧠🔧
38

About NVIDIA on Requesty

How many NVIDIA models are available through Requesty?
Requesty routes to 3 NVIDIA models including regional variants, with pricing synced in real time to the upstream provider.
What is the cheapest NVIDIA model?
NVIDIA has free tiers available — look for the models marked "Free" in the pricing column.
Does Requesty add markup on NVIDIA pricing?
No. Requesty passes through exactly what NVIDIA charges. You pay the same per-token rates as going direct — plus you get smart routing, caching, analytics, and one unified API for 400+ models.
Is my data used to train NVIDIA models?
NVIDIA's default terms may include data use for training. Check their privacy policy and Requesty's enterprise options for opt-out controls.
Where are NVIDIA models hosted?
NVIDIA models are hosted in 🇺🇸 US. Some models are available in additional regions through AWS Bedrock, Azure, or Google Vertex AI — filter by region on the NVIDIA rows in the models explorer.