Smarter-Than-Human Model Picking: Introducing Requesty Smart Routing

https://www.youtube.com/watch?v=fx3gX7ZSC9c

TL;DR – Stop guessing which LLM is “best” for every prompt. Requesty Smart Routing automatically classifies each task (code, chat, SQL, creative writing, etc.) in ~50 ms and forwards it to the optimal model (GPT-4o, Claude-3, Gemini-Flash, DeepSeek, Mistral-Large, you name it). One API key, zero context-switching, up to 80 % cost-savings and consistent latency.


1. Why We Built Smart Routing

Even power users struggle to juggle the expanding LLM zoo:

Task

“Best” model

today

Tokens / $1

Latency

Short chit-chat

Gemini-Flash-2.5

~3 000

⚡ Fast

Mid-sized coding

Claude-4 Sonnet

~1 100

🟡 Medium

Long-form blog

GPT-4o

~240

🔴 Slow

Tomorrow the table changes again.

Developers either (a) hard-code a single premium model and overpay, or (b) expose end-users to an intimidating “Pick your engine” drop-down. Both hurt UX and margins.

Smart Routing removes that decision entirely.


2. How It Works (Under the Hood)

  1. Task Classifier A compact, in-house transformer (≈65 M params, distilled from 50 k annotated examples) inspects the system + user prompt and predicts a task label in 20–100 ms. Example labels: chat_small, code_medium, sql, creative_long, image_insight.

  2. Policy Engine A YAML/JSON policy maps each label to:

    • Preferred model(s)

    • Budget ceiling

    • Max latency SLA

    • Fallback chain

    yamlCopyEditcode_medium: primary: "anthropic/claude-4-sonnet" fallback: ["openai/gpt-4o-mini", "mistral/mixtral-8x7b"] max_usd: 0.005 # per request max_latency_ms: 20000

  3. Router Gateway The same endpoint you’re already using: https://router.requesty.ai/v1/chat/completions Simply set model: "smart-task" (or any alias you choose) and pass your prompt. The gateway:

    • Calls the classifier

    • Consults policy

    • Forwards to the chosen provider

    • Logs everything in Live Logs & Analytics

  4. Observability Loop Every response is tagged with chosen_model, tokens, latency_ms, and cost_usd. These metrics feed back into policy tuning and your dashboards.


3. Live Demo Recap

In the launch video we:

  1. Connected OpenWebUI to router.requesty.ai/v1 with the alias smart-task.

  2. Asked “Who built you?” → Router picked Gemini-Flash in 2.8 s for <$0.001.

  3. Asked “Code a Snake game in Python.” → Router switched to Claude-4 Sonnet.

  4. Follow-up “Write a blog post about this Snake game.” → Router used Perplexity Sonar-Pro (fast, cheap long-form).

Three requests, three providers, zero manual switches.

Latency tax: ~65 ms average over 1 000 runs – imperceptible to humans.


4. Key Benefits

💡

What you get

Why it matters

One API, all models

No more env-var gymnastics or vendor-specific SDKs.

Faster prototyping, simpler back-end.

Automatic cost trimming

Cheaper models handle lightweight tasks.

Teams report 40-80 % savings.

Consistent UX

Users never face “Model selector anxiety”.

Higher retention, fewer support tickets.

Live analytics

Per-task spend, latency, error rates.

Data to renegotiate budgets or tweak prompts.

No vendor lock-in

Swap vendors via config, not code.

Future-proof as new models drop weekly.


5. Smart Routing vs. DIY Prompt Engineering

Manual approach

Requesty Smart Routing

Effort

Build & host classifier, maintain policies, integrate N SDKs.

Plug & play

Coverage

Depends on your data set.

50 k-prompt corpus, updated monthly.

Edge cases

You chase moving target alone.

We ship global fixes once for everyone.

Observability

Stitch together logs from each provider.

Unified Live Logs & Analytics.


6. Quick Start

bashCopyEditcurl https://router.requesty.ai/v1/chat/completions \ -H "Authorization: Bearer $REQUESTY_API_KEY" \ -d '{ "model": "smart-task", "messages": [ {"role":"user","content":"Generate a SQL query to find the 5 most active users last month"} ] }'

That’s literally it. 💫


7. Road-Map Sneak Peek

  • Reinforcement Learning loop – We’ll let your app vote 👍/👎 so the router learns your domain-specific preferences.

  • Fine-grained policy UI – Non-dev teammates can tweak cost limits and fallbacks without touching YAML.

  • Hybrid local-+-cloud routing – Seamlessly blend on-prem models with cloud giants.


8. Try It Today

  • 🆓 $6 in credits for every new workspace → requesty.ai

  • 📺 Watch the 2-min demo in the launch post

  • 🐙 Star us on GitHub (open-sourcing the policy spec soon)

  • 💬 Join our Discord to suggest new routing rules or models

Stop debating “which model should I use?” — let Requesty decide in real time and focus on building products your users love.