Case Study|Enterprise AI Gateway

How NotarBot Runs Real-Time Voice Assistants on EU Infrastructure with Requesty

NotarBot · Legal tech · AI voice agents

~50%

Of tokens served from cache

Faster TTFT

Lower latency on cached calls

100%

Of LLM traffic with EU data residency

No more 429s

Rate limits stopped reaching production

“Requesty was the only OpenRouter-style gateway with European data residency. That alone made the decision. The caching that fixed our rate-limit pain and our latency came as a bonus.”
Ignacio Ruisánchez
Founder & CEO, NotarBot

About NotarBot

NotarBot builds AI voice assistants for notary offices in Spain. Its agents pick up the phone, resolve routine enquiries, book appointments and transfer calls to the right clerk, with assistants in production across notary offices throughout Spain.

At NotarBot, AI is not a productivity tool. It is the product. Every inbound call is a live LLM conversation with sub-second latency targets. That makes the reliability, latency and data handling of the model layer a direct, audible part of the customer experience.

The Challenge

NotarBot originally called its model provider directly from its voice pipeline. As deployments grew, that setup started to crack:

Rate limits at peak hours. Morning call spikes pushed past provider quotas, and 429 responses meant stalled conversations on live phone calls. The usual workaround, rotating requests across several regions of the same provider to stay under the limits, fragmented the prompt cache, since each region keeps its own, so cache hits were rare.
No European gateway alternative. Multi-provider aggregators like OpenRouter solved the routing problem but processed traffic outside Europe, a non-starter for notarial data under GDPR.
A latency budget measured in milliseconds. A voice agent has roughly a second to start speaking; every extra hop or slow first token in the LLM path is audible to the caller as dead air.
Single-provider dependency. One provider incident meant every notaría's assistant degraded at once, with no automatic fallback to another provider.

Why Requesty

The evaluation came down to one hard requirement: a multi-provider gateway that keeps data in Europe. Requesty was the only one that met it, and then outperformed on everything else.

EU data residency

The only OpenRouter-style gateway NotarBot found with European processing, keeping notarial call data inside the EU and the GDPR story clean.

Prompt caching that just works

Each notaría's assistant carries a long, static system prompt; Requesty serves it from cache at a fraction of the cost and latency. Because traffic no longer rotates across regions, it stays on a stable endpoint and hits that cache on nearly every turn, with no changes to NotarBot's pipeline.

Intelligent routing and fallbacks

Requesty absorbs rate limits by routing to model deployments with far higher limits and failing over across providers automatically, so NotarBot no longer has to rotate across regions to stay under quota, and peak-hour spikes never reach a live call.

Cost tracking out of the box

Per-key spend visibility across the whole fleet of assistants without building in-house metering.

The Results

~50% · Of tokens served from cache

With traffic no longer rotating across regions to dodge rate limits, the long per-assistant system prompts hit Requesty's prompt cache on nearly every conversation turn, cutting LLM spend across every deployed assistant.

Faster TTFT · Lower latency on cached calls

Cached prompts come back with a faster time-to-first-token. In a phone conversation that difference is audible: responses land inside the latency budget instead of as dead air.

100% · Of LLM traffic with EU data residency

Every model call is processed in Europe, which keeps GDPR compliance straightforward when notary offices ask where their call data goes.

No more 429s · Rate limits stopped reaching production

Requesty's routing and far higher provider limits absorb peak-hour spikes, so the rate-limit errors that used to stall live calls disappeared from the call path.

In Their Words

“Requesty sits in the critical path of every phone call we answer, and we simply stopped thinking about it. Calls are faster, costs are lower and the data stays in Europe, which is exactly what an infrastructure layer should be.”
Ignacio Ruisánchez
Founder & CEO, NotarBot