https://www.youtube.com/watch?v=fx3gX7ZSC9c
TL;DR – Stop guessing which LLM is “best” for every prompt. Requesty Smart Routing automatically classifies each task (code, chat, SQL, creative writing, etc.) in ~50 ms and forwards it to the optimal model (GPT-4o, Claude-3, Gemini-Flash, DeepSeek, Mistral-Large, you name it). One API key, zero context-switching, up to 80 % cost-savings and consistent latency.
1. Why We Built Smart Routing
Even power users struggle to juggle the expanding LLM zoo:
Task | “Best” model today | Tokens / $1 | Latency |
Short chit-chat | Gemini-Flash-2.5 | ~3 000 | ⚡ Fast |
Mid-sized coding | Claude-4 Sonnet | ~1 100 | 🟡 Medium |
Long-form blog | GPT-4o | ~240 | 🔴 Slow |
Tomorrow the table changes again.
Developers either (a) hard-code a single premium model and overpay, or (b) expose end-users to an intimidating “Pick your engine” drop-down. Both hurt UX and margins.
Smart Routing removes that decision entirely.
2. How It Works (Under the Hood)
Task Classifier A compact, in-house transformer (≈65 M params, distilled from 50 k annotated examples) inspects the system + user prompt and predicts a task label in 20–100 ms. Example labels:
chat_small
,code_medium
,sql
,creative_long
,image_insight
.Policy Engine A YAML/JSON policy maps each label to:
Preferred model(s)
Budget ceiling
Max latency SLA
Fallback chain
yamlCopyEditcode_medium: primary: "anthropic/claude-4-sonnet" fallback: ["openai/gpt-4o-mini", "mistral/mixtral-8x7b"] max_usd: 0.005 # per request max_latency_ms: 20000
Router Gateway The same endpoint you’re already using:
https://router.requesty.ai/v1/chat/completions
Simply setmodel: "smart-task"
(or any alias you choose) and pass your prompt. The gateway:Calls the classifier
Consults policy
Forwards to the chosen provider
Logs everything in Live Logs & Analytics
Observability Loop Every response is tagged with
chosen_model
,tokens
,latency_ms
, andcost_usd
. These metrics feed back into policy tuning and your dashboards.
3. Live Demo Recap
In the launch video we:
Connected OpenWebUI to
router.requesty.ai/v1
with the alias smart-task.Asked “Who built you?” → Router picked Gemini-Flash in 2.8 s for <$0.001.
Asked “Code a Snake game in Python.” → Router switched to Claude-4 Sonnet.
Follow-up “Write a blog post about this Snake game.” → Router used Perplexity Sonar-Pro (fast, cheap long-form).
Three requests, three providers, zero manual switches.
Latency tax: ~65 ms average over 1 000 runs – imperceptible to humans.
4. Key Benefits
💡 | What you get | Why it matters |
One API, all models | No more env-var gymnastics or vendor-specific SDKs. | Faster prototyping, simpler back-end. |
Automatic cost trimming | Cheaper models handle lightweight tasks. | Teams report 40-80 % savings. |
Consistent UX | Users never face “Model selector anxiety”. | Higher retention, fewer support tickets. |
Live analytics | Per-task spend, latency, error rates. | Data to renegotiate budgets or tweak prompts. |
No vendor lock-in | Swap vendors via config, not code. | Future-proof as new models drop weekly. |
5. Smart Routing vs. DIY Prompt Engineering
Manual approach | Requesty Smart Routing | |
Effort | Build & host classifier, maintain policies, integrate N SDKs. | Plug & play |
Coverage | Depends on your data set. | 50 k-prompt corpus, updated monthly. |
Edge cases | You chase moving target alone. | We ship global fixes once for everyone. |
Observability | Stitch together logs from each provider. | Unified Live Logs & Analytics. |
6. Quick Start
bashCopyEditcurl https://router.requesty.ai/v1/chat/completions \ -H "Authorization: Bearer $REQUESTY_API_KEY" \ -d '{ "model": "smart-task", "messages": [ {"role":"user","content":"Generate a SQL query to find the 5 most active users last month"} ] }'
That’s literally it. 💫
7. Road-Map Sneak Peek
Reinforcement Learning loop – We’ll let your app vote 👍/👎 so the router learns your domain-specific preferences.
Fine-grained policy UI – Non-dev teammates can tweak cost limits and fallbacks without touching YAML.
Hybrid local-+-cloud routing – Seamlessly blend on-prem models with cloud giants.
8. Try It Today
🆓 $6 in credits for every new workspace → requesty.ai
📺 Watch the 2-min demo in the launch post
🐙 Star us on GitHub (open-sourcing the policy spec soon)
💬 Join our Discord to suggest new routing rules or models
Stop debating “which model should I use?” — let Requesty decide in real time and focus on building products your users love.