Novita AI

qwen/qwen-2-vl-72b-instruct

Qwen2 VL 72B is a multimodal LLM from the Qwen Team with the following key enhancements: SoTA understanding of images of various resolution & ratio: Qwen2-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc. Understanding videos of 20min+: Qwen2-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc. Agent that can operate your mobiles, robots, etc.: with the abilities of complex reasoning and decision making, Qwen2-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions. Multilingual Support: to serve global users, besides English and Chinese, Qwen2-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc.

Pricing

$0.45

Input tokens per million

$0.45

Output tokens per million

Technical Specifications

Context Window

33K tokens

Max Output Tokens

Unlimited

Global Availability

Last Updated

N/A

Provider

Novita AI

Location

🇺🇸 US

Visit Website →

Privacy & Data

Data Retention

Used for Training

Novita AI Privacy Policy →

Get Started

Try with Requesty Browse All Novita AI Models →