About NotarBot
NotarBot builds AI voice assistants for notary offices in Spain. Its agents pick up the phone, resolve routine enquiries, book appointments and transfer calls to the right clerk, with assistants in production across notary offices throughout Spain.
At NotarBot, AI is not a productivity tool. It is the product. Every inbound call is a live LLM conversation with sub-second latency targets. That makes the reliability, latency and data handling of the model layer a direct, audible part of the customer experience.
The Challenge
NotarBot originally called its model provider directly from its voice pipeline. As deployments grew, that setup started to crack:
- Rate limits at peak hours. Morning call spikes pushed past provider quotas, and 429 responses meant stalled conversations on live phone calls. The usual workaround, rotating requests across several regions of the same provider to stay under the limits, fragmented the prompt cache, since each region keeps its own, so cache hits were rare.
- No European gateway alternative. Multi-provider aggregators like OpenRouter solved the routing problem but processed traffic outside Europe, a non-starter for notarial data under GDPR.
- A latency budget measured in milliseconds. A voice agent has roughly a second to start speaking; every extra hop or slow first token in the LLM path is audible to the caller as dead air.
- Single-provider dependency. One provider incident meant every notaría's assistant degraded at once, with no automatic fallback to another provider.

