Join our Discord

Novita AI

AI-powered creative tools and model hosting.

📍 🇺🇸 US•30 models available•Visit Website →

30

Available Models

$0.56

Avg Input Price/M

$0.02

Cheapest Model

novita/meta-llama/llama-3.2-1b-instruct

$4.00

Most Expensive

novita/deepseek/deepseek-r1

Features Overview

0

Vision Support

0

Advanced Reasoning

0

Caching Support

0

Computer Use

Privacy & Data Policy

Data Retention

Yes

Location

🇺🇸 US

Privacy Policy

Novita AI Privacy Policy →

All Novita AI Models

View All Providers →

Novita AI

deepseek/deepseek-prover-v2-671b

Context Window

160K tokens

Max Output

Unlimited

Input

$0.7/M tokens

Output

$2.50/M tokens

DeepSeek-R1-Distill-Qwen-7B is a 7 billion parameter dense language model distilled from DeepSeek-R1, leveraging reinforcement learning-enhanced reasoning data generated by DeepSeek's larger models. The distillation process transfers advanced reasoning, math, and code capabilities into a smaller, more efficient model architecture based on Qwen2.5-Math-7B. This model demonstrates strong performance across mathematical benchmarks (92.8% pass@1 on MATH-500), coding tasks (Codeforces rating 1189), and general reasoning (49.1% pass@1 on GPQA Diamond), achieving competitive accuracy relative to larger models while maintaining smaller inference costs.

View Details →

Novita AI

nousresearch/hermes-2-pro-llama-3-8b

Context Window

8K tokens

Max Output

Unlimited

Input

$0.14/M tokens

Output

$0.14/M tokens

Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced Function Calling and JSON Mode dataset developed in-house.

View Details →

Novita AI

meta-llama/llama-3.3-70b-instruct

Context Window

131K tokens

Max Output

Unlimited

Input

$0.39/M tokens

Output

$0.39/M tokens

The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks. Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.

View Details →

Novita AI

gryphe/mythomax-l2-13b

Context Window

4K tokens

Max Output

Unlimited

Input

$0.09/M tokens

Output

$0.09/M tokens

The idea behind this merge is that each layer is composed of several tensors, which are in turn responsible for specific functions. Using MythoLogic-L2's robust understanding as its input and Huginn's extensive writing capability as its output seems to have resulted in a model that exceeds at both, confirming my theory. (More details to be released at a later time).

View Details →

Novita AI

deepseek/deepseek-r1-distill-llama-70b

Context Window

32K tokens

Max Output

Unlimited

Input

$0.8/M tokens

Output

$0.8/M tokens

DeepSeek R1 Distill LLama 70B

View Details →

Novita AI

sao10k/l3-70b-euryale-v2.1

Context Window

16K tokens

Max Output

Unlimited

Input

$1.48/M tokens

Output

$1.48/M tokens

The uncensored llama3 model is a powerhouse of creativity, excelling in both roleplay and story writing. It offers a liberating experience during roleplays, free from any restrictions. This model stands out for its immense creativity, boasting a vast array of unique ideas and plots, truly a treasure trove for those seeking originality. Its unrestricted nature during roleplays allows for the full breadth of imagination to unfold, akin to an enhanced, big-brained version of Stheno. Perfect for creative minds seeking a boundless platform for their imaginative expressions, the uncensored llama3 model is an ideal choice

View Details →

Novita AI

qwen/qwen3-235b-a22b-fp8

Context Window

128K tokens

Max Output

Unlimited

Input

$0.2/M tokens

Output

$0.8/M tokens

Qwen3, the latest generation in the Qwen large language model series, features both dense and mixture-of-experts (MoE) architectures to excel in reasoning, multilingual support, and advanced agent tasks. Its unique ability to switch seamlessly between a thinking mode for complex reasoning and a non-thinking mode for efficient dialogue ensures versatile, high-quality performance. Significantly outperforming prior models like QwQ and Qwen2.5, Qwen3 delivers superior mathematics, coding, commonsense reasoning, creative writing, and interactive dialogue capabilities. The Qwen3-30B-A3B variant includes 30.5 billion parameters (3.3 billion activated), 48 layers, 128 experts (8 activated per task), and supports up to 131K token contexts with YaRN, setting a new standard among open-source models.

View Details →

Novita AI

meta-llama/llama-3-70b-instruct

Context Window

8K tokens

Max Output

Unlimited

Input

$0.51/M tokens

Output

$0.74/M tokens

Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 70B instruct-tuned version was optimized for high quality dialogue usecases. It has demonstrated strong performance compared to leading closed-source models in human evaluations.

View Details →

Novita AI

deepseek/deepseek-r1-turbo

Context Window

64K tokens

Max Output

Unlimited

Input

$0.7/M tokens

Output

$2.50/M tokens

DeepSeek R1 is the latest open-source model released by the DeepSeek team, featuring impressive reasoning capabilities, particularly achieving performance comparable to OpenAI's o1 model in mathematics, coding, and reasoning tasks.

View Details →

Novita AI

meta-llama/llama-3.2-3b-instruct

Context Window

33K tokens

Max Output

Unlimited

Input

$0.03/M tokens

Output

$0.05/M tokens

The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out)

View Details →

Novita AI

moonshotai/kimi-k2-instruct

Context Window

131K tokens

Max Output

Unlimited

Input

$0.57/M tokens

Output

$2.30/M tokens

View Details →

Novita AI

mistralai/mistral-nemo

Context Window

131K tokens

Max Output

Unlimited

Input

$0.17/M tokens

Output

$0.17/M tokens

A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA. The model is multilingual, supporting English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi. It supports function calling and is released under the Apache 2.0 license.

View Details →

Novita AI

deepseek/deepseek-r1-distill-qwen-14b

Context Window

128K tokens

Max Output

Unlimited

Input

$0.15/M tokens

Output

$0.15/M tokens

DeepSeek R1 Distill Qwen 14B is a distilled large language model based on Qwen 2.5 14B, using outputs from DeepSeek R1. It outperforms OpenAI's o1-mini across various benchmarks, achieving new state-of-the-art results for dense models. Other benchmark results include: AIME 2024 pass@1: 69.7 MATH-500 pass@1: 93.9 CodeForces Rating: 1481 The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.

View Details →

Novita AI

meta-llama/llama-4-maverick-17b-128e-instruct-fp8

Context Window

1.0M tokens

Max Output

1.0M tokens

Input

$0.2/M tokens

Output

$0.85/M tokens

A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.

View Details →

Novita AI

qwen/qwen2.5-vl-72b-instruct

Context Window

96K tokens

Max Output

Unlimited

Input

$0.8/M tokens

Output

$0.8/M tokens

Qwen2 VL 72B is a multimodal LLM from the Qwen Team with the following key enhancements: SoTA understanding of images of various resolution & ratio: Qwen2-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc. Understanding videos of 20min+: Qwen2-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc. Agent that can operate your mobiles, robots, etc.: with the abilities of complex reasoning and decision making, Qwen2-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions. Multilingual Support: to serve global users, besides English and Chinese, Qwen2-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc.

View Details →

Novita AI

meta-llama/llama-3.2-1b-instruct

Context Window

131K tokens

Max Output

Unlimited

Input

$0.02/M tokens

Output

$0.02/M tokens

The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out).

View Details →

Novita AI

Sao10K/L3-8B-Stheno-v3.2

Context Window

8K tokens

Max Output

Unlimited

Input

$0.05/M tokens

Output

$0.05/M tokens

Sao10K/L3-8B-Stheno-v3.2 is a highly skilled actor that excels at fully immersing itself in any role assigned.

View Details →

Novita AI

sao10k/l31-70b-euryale-v2.2

Context Window

16K tokens

Max Output

Unlimited

Input

$1.48/M tokens

Output

$1.48/M tokens

Euryale L3.1 70B v2.2 is a model focused on creative roleplay from Sao10k. It is the successor of Euryale L3 70B v2.1.

View Details →

Novita AI

deepseek/deepseek-r1-distill-qwen-32b

Context Window

13K tokens

Max Output

Unlimited

Input

$0.3/M tokens

Output

$0.3/M tokens

DeepSeek R1 Distill Qwen 32B is a distilled large language model based on Qwen 2.5 32B, using outputs from DeepSeek R1. It outperforms OpenAI's o1-mini across various benchmarks, achieving new state-of-the-art results for dense models. Other benchmark results include: AIME 2024 pass@1: 72.6 MATH-500 pass@1: 94.3 CodeForces Rating: 1691 The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.

View Details →

Novita AI

deepseek/deepseek_v3

Context Window

64K tokens

Max Output

Unlimited

Input

$0.89/M tokens

Output

$0.89/M tokens

DeepSeek-V3 is the latest model from the DeepSeek team, building upon the instruction following and coding abilities of the previous versions. Pre-trained on nearly 15 trillion tokens, the reported evaluations reveal that the model outperforms other open-source models and rivals leading closed-source models.

View Details →

Novita AI

microsoft/wizardlm-2-8x22b

Context Window

66K tokens

Max Output

Unlimited

Input

$0.62/M tokens

Output

$0.62/M tokens

WizardLM-2 8x22B is Microsoft AI's most advanced Wizard model. It demonstrates highly competitive performance compared to leading proprietary models, and it consistently outperforms all existing state-of-the-art opensource models.

View Details →

Novita AI

deepseek/deepseek-r1

Context Window

64K tokens

Max Output

Unlimited

Input

$4.00/M tokens

Output

$4.00/M tokens

DeepSeek R1 is the latest open-source model released by the DeepSeek team, featuring impressive reasoning capabilities, particularly achieving performance comparable to OpenAI's o1 model in mathematics, coding, and reasoning tasks.

View Details →

Novita AI

meta-llama/llama-3-8b-instruct

Context Window

8K tokens

Max Output

Unlimited

Input

$0.04/M tokens

Output

$0.04/M tokens

Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instruct-tuned version was optimized for high quality dialogue usecases. It has demonstrated strong performance compared to leading closed-source models in human evaluations.

View Details →

Novita AI

zai-org/glm-4.5

Context Window

131K tokens

Max Output

Unlimited

Input

$0.6/M tokens

Output

$2.20/M tokens

View Details →

Novita AI

sao10k/l3-8b-lunaris

Context Window

8K tokens

Max Output

Unlimited

Input

$0.05/M tokens

Output

$0.05/M tokens

A generalist / roleplaying model merge based on Llama 3.

View Details →

Novita AI

deepseek/deepseek-v3-turbo

Context Window

128K tokens

Max Output

Unlimited

Input

$0.4/M tokens

Output

$1.30/M tokens

DeepSeek R1 is the latest open-source model released by the DeepSeek team, featuring impressive reasoning capabilities, particularly achieving performance comparable to OpenAI's o1 model in mathematics, coding, and reasoning tasks.

View Details →

Novita AI

deepseek/deepseek-v3-0324

Context Window

128K tokens

Max Output

Unlimited

Input

$0.4/M tokens

Output

$1.30/M tokens

DeepSeek R1 is the latest open-source model released by the DeepSeek team, featuring impressive reasoning capabilities, particularly achieving performance comparable to OpenAI's o1 model in mathematics, coding, and reasoning tasks.

View Details →

Novita AI

zai-org/glm-4.6

Context Window

205K tokens

Max Output

131K tokens

Input

$0.6/M tokens

Output

$2.20/M tokens

GLM-4.6 is Z AI’s latest flagship model, designed to push agentic and coding performance further. It expands the context window from 128K to 200K tokens, improves reasoning and tool-use capabilities, and delivers stronger results in coding benchmarks and real-world development workflows. GLM-4.6 demonstrates refined writing quality, more capable agent behavior, and higher token efficiency (≈15% fewer tokens vs. GLM-4.5). Evaluations show clear gains over GLM-4.5 across reasoning, agents, and coding, reaching near parity with Claude Sonnet 4 in practical tasks while outperforming other open-source baselines. GLM-4.6 is available through the Z.ai API platform, OpenRouter, coding agents (Claude Code, Roo Code, Cline, Kilo Code), and soon as downloadable weights on HuggingFace and ModelScope.

View Details →

Novita AI

qwen/qwen-2.5-72b-instruct

Context Window

32K tokens

Max Output

Unlimited

Input

$0.38/M tokens

Output

$0.4/M tokens

Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters.

View Details →

Novita AI

meta-llama/llama-3.1-8b-instruct

Context Window

16K tokens

Max Output

Unlimited

Input

$0.05/M tokens

Output

$0.05/M tokens

Meta's latest class of models, Llama 3.1, launched with a variety of sizes and configurations. The 8B instruct-tuned version is particularly fast and efficient. It has demonstrated strong performance in human evaluations, outperforming several leading closed-source models.

View Details →

Ready to use Novita AI models?

Access all Novita AI models through Requesty's unified API with intelligent routing, caching, and cost optimization.

Get Started Free View Pricing