Novita AI

AI-powered creative tools and model hosting.

πŸ“ πŸ‡ΊπŸ‡Έ USβ€’42 models availableβ€’Visit Website β†’
42
Available Models
$0.46
Avg Input Price/M
$0.02
Cheapest Model
novita/meta-llama/llama-3.2-1b-instruct
$4.00
Most Expensive
novita/deepseek/deepseek-r1

Features Overview

0
Vision Support
0
Advanced Reasoning
0
Caching Support
0
Computer Use

Privacy & Data Policy

Data Retention

Yes

Location

πŸ‡ΊπŸ‡Έ US

All Novita AI Models

View All Providers β†’
Context Window
33K tokens
Max Output
Unlimited
Input
$0.45/M tokens
Output
$0.45/M tokens

Qwen2 VL 72B is a multimodal LLM from the Qwen Team with the following key enhancements: SoTA understanding of images of various resolution & ratio: Qwen2-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc. Understanding videos of 20min+: Qwen2-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc. Agent that can operate your mobiles, robots, etc.: with the abilities of complex reasoning and decision making, Qwen2-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions. Multilingual Support: to serve global users, besides English and Chinese, Qwen2-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc.

Context Window
16K tokens
Max Output
Unlimited
Input
$0.05/M tokens
Output
$0.05/M tokens

Meta's latest class of models, Llama 3.1, launched with a variety of sizes and configurations. The 8B instruct-tuned version is particularly fast and efficient. It has demonstrated strong performance in human evaluations, outperforming several leading closed-source models.

Context Window
131K tokens
Max Output
Unlimited
Input
$0.17/M tokens
Output
$0.17/M tokens

A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA. The model is multilingual, supporting English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi. It supports function calling and is released under the Apache 2.0 license.

Context Window
33K tokens
Max Output
Unlimited
Input
$0.34/M tokens
Output
$0.39/M tokens

Meta's latest class of models, Llama 3.1, has launched with a variety of sizes and configurations. The 70B instruct-tuned version is optimized for high-quality dialogue use cases. It has demonstrated strong performance in human evaluations compared to leading closed-source models.

Context Window
16K tokens
Max Output
Unlimited
Input
$0.05/M tokens
Output
$0.05/M tokens

Meta's latest class of models, Llama 3.1, launched with a variety of sizes and configurations. The 8B instruct-tuned version is particularly fast and efficient. It has demonstrated strong performance in human evaluations, outperforming several leading closed-source models.

Context Window
33K tokens
Max Output
Unlimited
Input
$0.06/M tokens
Output
$0.06/M tokens

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and visual question answering, bridging the gap between language generation and visual reasoning. Pre-trained on a massive dataset of image-text pairs, it performs well in complex, high-accuracy image analysis. Its ability to integrate visual understanding with language processing makes it an ideal solution for industries requiring comprehensive visual-linguistic AI applications, such as content creation, AI-driven customer service, and research.

Context Window
32K tokens
Max Output
Unlimited
Input
$0.38/M tokens
Output
$0.4/M tokens

Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters.

Context Window
128K tokens
Max Output
Unlimited
Input
$0.2/M tokens
Output
$0.8/M tokens

Qwen3, the latest generation in the Qwen large language model series, features both dense and mixture-of-experts (MoE) architectures to excel in reasoning, multilingual support, and advanced agent tasks. Its unique ability to switch seamlessly between a thinking mode for complex reasoning and a non-thinking mode for efficient dialogue ensures versatile, high-quality performance. Significantly outperforming prior models like QwQ and Qwen2.5, Qwen3 delivers superior mathematics, coding, commonsense reasoning, creative writing, and interactive dialogue capabilities. The Qwen3-30B-A3B variant includes 30.5 billion parameters (3.3 billion activated), 48 layers, 128 experts (8 activated per task), and supports up to 131K token contexts with YaRN, setting a new standard among open-source models.

Context Window
4K tokens
Max Output
Unlimited
Input
$0.5/M tokens
Output
$0.5/M tokens

This is a fine-tuned Llama-2 model designed to support longer and more detailed writing prompts, as well as next-chapter generation. It also includes an experimental role-playing instruction set with multi-round dialogues, character interactions, and varying numbers of participants

Context Window
66K tokens
Max Output
Unlimited
Input
$0.62/M tokens
Output
$0.62/M tokens

WizardLM-2 8x22B is Microsoft AI's most advanced Wizard model. It demonstrates highly competitive performance compared to leading proprietary models, and it consistently outperforms all existing state-of-the-art opensource models.

Context Window
1.0M tokens
Max Output
1.0M tokens
Input
$0.2/M tokens
Output
$0.85/M tokens

A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.

Context Window
4K tokens
Max Output
Unlimited
Input
$0.06/M tokens
Output
$0.06/M tokens

OpenChat 7B is a library of open-source language models, fine-tuned with "C-RLFT (Conditioned Reinforcement Learning Fine-Tuning)" - a strategy inspired by offline reinforcement learning. It has been trained on mixed-quality data without preference labels.

Context Window
8K tokens
Max Output
Unlimited
Input
$0.04/M tokens
Output
$0.04/M tokens

Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instruct-tuned version was optimized for high quality dialogue usecases. It has demonstrated strong performance compared to leading closed-source models in human evaluations.

Context Window
8K tokens
Max Output
Unlimited
Input
$0.08/M tokens
Output
$0.08/M tokens

Gemma 2 9B by Google is an advanced, open-source language model that sets a new standard for efficiency and performance in its size class. Designed for a wide variety of tasks, it empowers developers and researchers to build innovative applications, while maintaining accessibility, safety, and cost-effectiveness.

Context Window
13K tokens
Max Output
Unlimited
Input
$0.3/M tokens
Output
$0.3/M tokens

DeepSeek R1 Distill Qwen 32B is a distilled large language model based on Qwen 2.5 32B, using outputs from DeepSeek R1. It outperforms OpenAI's o1-mini across various benchmarks, achieving new state-of-the-art results for dense models. Other benchmark results include: AIME 2024 pass@1: 72.6 MATH-500 pass@1: 94.3 CodeForces Rating: 1691 The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.

Context Window
131K tokens
Max Output
Unlimited
Input
$0.6/M tokens
Output
$2.20/M tokens
Context Window
33K tokens
Max Output
Unlimited
Input
$0.03/M tokens
Output
$0.05/M tokens

The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out)

Context Window
64K tokens
Max Output
Unlimited
Input
$4.00/M tokens
Output
$4.00/M tokens

DeepSeek R1 is the latest open-source model released by the DeepSeek team, featuring impressive reasoning capabilities, particularly achieving performance comparable to OpenAI's o1 model in mathematics, coding, and reasoning tasks.

Context Window
4K tokens
Max Output
Unlimited
Input
$0.17/M tokens
Output
$0.17/M tokens

OpenHermes 2.5 Mistral 7B is a state of the art Mistral Fine-tune, a continuation of OpenHermes 2 model, which trained on additional code datasets.

Context Window
96K tokens
Max Output
Unlimited
Input
$0.8/M tokens
Output
$0.8/M tokens

Qwen2 VL 72B is a multimodal LLM from the Qwen Team with the following key enhancements: SoTA understanding of images of various resolution & ratio: Qwen2-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc. Understanding videos of 20min+: Qwen2-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc. Agent that can operate your mobiles, robots, etc.: with the abilities of complex reasoning and decision making, Qwen2-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions. Multilingual Support: to serve global users, besides English and Chinese, Qwen2-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc.

Context Window
131K tokens
Max Output
Unlimited
Input
$0.39/M tokens
Output
$0.39/M tokens

The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks. Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.

Context Window
160K tokens
Max Output
Unlimited
Input
$0.7/M tokens
Output
$2.50/M tokens

DeepSeek-R1-Distill-Qwen-7B is a 7 billion parameter dense language model distilled from DeepSeek-R1, leveraging reinforcement learning-enhanced reasoning data generated by DeepSeek's larger models. The distillation process transfers advanced reasoning, math, and code capabilities into a smaller, more efficient model architecture based on Qwen2.5-Math-7B. This model demonstrates strong performance across mathematical benchmarks (92.8% pass@1 on MATH-500), coding tasks (Codeforces rating 1189), and general reasoning (49.1% pass@1 on GPQA Diamond), achieving competitive accuracy relative to larger models while maintaining smaller inference costs.

Context Window
128K tokens
Max Output
Unlimited
Input
$0.15/M tokens
Output
$0.15/M tokens

DeepSeek R1 Distill Qwen 14B is a distilled large language model based on Qwen 2.5 14B, using outputs from DeepSeek R1. It outperforms OpenAI's o1-mini across various benchmarks, achieving new state-of-the-art results for dense models. Other benchmark results include: AIME 2024 pass@1: 69.7 MATH-500 pass@1: 93.9 CodeForces Rating: 1481 The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.

Context Window
33K tokens
Max Output
Unlimited
Input
$0.05/M tokens
Output
$0.05/M tokens

Qwen2 is the newest series in the Qwen large language model family. Qwen2 7B is a transformer-based model that demonstrates exceptional performance in language understanding, multilingual capabilities, programming, mathematics, and reasoning.

Context Window
8K tokens
Max Output
Unlimited
Input
$0.14/M tokens
Output
$0.14/M tokens

Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced Function Calling and JSON Mode dataset developed in-house.

Context Window
4K tokens
Max Output
Unlimited
Input
$0.09/M tokens
Output
$0.09/M tokens

The idea behind this merge is that each layer is composed of several tensors, which are in turn responsible for specific functions. Using MythoLogic-L2's robust understanding as its input and Huginn's extensive writing capability as its output seems to have resulted in a model that exceeds at both, confirming my theory. (More details to be released at a later time).

Context Window
16K tokens
Max Output
Unlimited
Input
$1.48/M tokens
Output
$1.48/M tokens

Euryale L3.1 70B v2.2 is a model focused on creative roleplay from Sao10k. It is the successor of Euryale L3 70B v2.1.

Context Window
128K tokens
Max Output
Unlimited
Input
$0.4/M tokens
Output
$1.30/M tokens

DeepSeek R1 is the latest open-source model released by the DeepSeek team, featuring impressive reasoning capabilities, particularly achieving performance comparable to OpenAI's o1 model in mathematics, coding, and reasoning tasks.

Context Window
8K tokens
Max Output
Unlimited
Input
$0.05/M tokens
Output
$0.05/M tokens

A generalist / roleplaying model merge based on Llama 3.

Context Window
64K tokens
Max Output
Unlimited
Input
$0.89/M tokens
Output
$0.89/M tokens

DeepSeek-V3 is the latest model from the DeepSeek team, building upon the instruction following and coding abilities of the previous versions. Pre-trained on nearly 15 trillion tokens, the reported evaluations reveal that the model outperforms other open-source models and rivals leading closed-source models.

Novita AI

qwen/qwq-32b

Context Window
33K tokens
Max Output
Unlimited
Input
$0.18/M tokens
Output
$0.2/M tokens

Qwen3, the latest generation in the Qwen large language model series, features both dense and mixture-of-experts (MoE) architectures to excel in reasoning, multilingual support, and advanced agent tasks. Its unique ability to switch seamlessly between a thinking mode for complex reasoning and a non-thinking mode for efficient dialogue ensures versatile, high-quality performance. Significantly outperforming prior models like QwQ and Qwen2.5, Qwen3 delivers superior mathematics, coding, commonsense reasoning, creative writing, and interactive dialogue capabilities. The Qwen3-30B-A3B variant includes 30.5 billion parameters (3.3 billion activated), 48 layers, 128 experts (8 activated per task), and supports up to 131K token contexts with YaRN, setting a new standard among open-source models.

Context Window
32K tokens
Max Output
Unlimited
Input
$0.8/M tokens
Output
$0.8/M tokens

DeepSeek R1 Distill LLama 70B

Context Window
8K tokens
Max Output
Unlimited
Input
$0.05/M tokens
Output
$0.05/M tokens

Sao10K/L3-8B-Stheno-v3.2 is a highly skilled actor that excels at fully immersing itself in any role assigned.

Context Window
128K tokens
Max Output
Unlimited
Input
$0.4/M tokens
Output
$1.30/M tokens

DeepSeek R1 is the latest open-source model released by the DeepSeek team, featuring impressive reasoning capabilities, particularly achieving performance comparable to OpenAI's o1 model in mathematics, coding, and reasoning tasks.

Context Window
16K tokens
Max Output
Unlimited
Input
$1.48/M tokens
Output
$1.48/M tokens

The uncensored llama3 model is a powerhouse of creativity, excelling in both roleplay and story writing. It offers a liberating experience during roleplays, free from any restrictions. This model stands out for its immense creativity, boasting a vast array of unique ideas and plots, truly a treasure trove for those seeking originality. Its unrestricted nature during roleplays allows for the full breadth of imagination to unfold, akin to an enhanced, big-brained version of Stheno. Perfect for creative minds seeking a boundless platform for their imaginative expressions, the uncensored llama3 model is an ideal choice

Context Window
4K tokens
Max Output
Unlimited
Input
$0.17/M tokens
Output
$0.17/M tokens

Nous-Hermes-Llama2-13b is a state-of-the-art language model fine-tuned on over 300,000 instructions. This model was fine-tuned by Nous Research, with Teknium and Emozilla leading the fine tuning process and dataset curation, Redmond AI sponsoring the compute, and several other contributors.

Context Window
4K tokens
Max Output
Unlimited
Input
$0.8/M tokens
Output
$0.8/M tokens

A merge with a complex family tree, this model was crafted for roleplaying and storytelling. Midnight Rose is a successor to Rogue Rose and Aurora Nights and improves upon them both. It wants to produce lengthy output by default and is the best creative writing merge produced so far by sophosympatheia.

Context Window
64K tokens
Max Output
Unlimited
Input
$0.7/M tokens
Output
$2.50/M tokens

DeepSeek R1 is the latest open-source model released by the DeepSeek team, featuring impressive reasoning capabilities, particularly achieving performance comparable to OpenAI's o1 model in mathematics, coding, and reasoning tasks.

Context Window
131K tokens
Max Output
Unlimited
Input
$0.02/M tokens
Output
$0.02/M tokens

The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out).

Context Window
33K tokens
Max Output
Unlimited
Input
$0.06/M tokens
Output
$0.06/M tokens

A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length.

Context Window
131K tokens
Max Output
Unlimited
Input
$0.57/M tokens
Output
$2.30/M tokens
Context Window
8K tokens
Max Output
Unlimited
Input
$0.51/M tokens
Output
$0.74/M tokens

Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 70B instruct-tuned version was optimized for high quality dialogue usecases. It has demonstrated strong performance compared to leading closed-source models in human evaluations.

Ready to use Novita AI models?

Access all Novita AI models through Requesty's unified API with intelligent routing, caching, and cost optimization.

Novita AI AI Models - Pricing & Features | Requesty | Requesty