Requesty
Back|MAY '26AGENTS / A I RESEARCH
7 MIN READ|

5 AI Agent Techniques That Just Dropped This Week (May 2026)

Thibault Jaigu
Thibault Jaigu
CEO & Co-Founder
Published

The week of May 19 to 23, 2026 delivered a wave of new research and product announcements that fundamentally change how production AI agents work. Google declared the "agentic Gemini era" at I/O, DeepMind shipped AlphaEvolve to more fields, and arXiv saw a cluster of papers solving problems that have blocked real-world agent deployments for months.

Here are the five techniques that matter most for teams building AI agents right now.


1. Self-evolving agents that rewrite their own code

The most striking paper this week is MOSS (Self-Evolution through Source-Level Rewriting in Autonomous Agent Systems). The core idea: an agent identifies weaknesses in its own logic, rewrites specific modules of its source code, validates the changes through automated tests, and deploys the improved version of itself.

This is not prompt tuning or weight updates. The agent literally modifies its own Python or TypeScript source files, creating a feedback loop where each task execution can improve the system for future tasks.

A companion paper, Ratchet, provides "minimal hygiene recipes" for keeping self-evolving agents stable. It introduces non-divergence analysis that prevents an agent from rewriting itself into a broken state. Think of it as guardrails for self-modification: the agent can improve, but cannot degrade below its previous benchmark score.

Why this matters for production. Today, improving an agent means a human reviews logs, identifies failure patterns, and manually updates prompts or code. Self-evolving agents automate this loop. A coding agent that fails on a specific pattern of TypeScript refactoring can patch its own tool-calling logic to handle that pattern next time, without waiting for a developer to ship an update.

How to use it today. You can implement a lightweight version of this pattern with any agent SDK. After each task, run an evaluation step. If the score drops below a threshold, trigger a "self-improvement" agent that analyzes the failure and proposes a patch to the system prompt or tool definitions. Route the improvement agent through Requesty's routing policies to use a high-reasoning model (like Claude Opus or o3) for the self-analysis step, while keeping the execution agent on a faster, cheaper model.


2. Managed agents at the API level

Google's I/O 2026 keynote introduced Managed Agents in the Gemini API. This shifts agent orchestration from client-side to server-side.

Previously, building a proactive agent meant running your own infrastructure: a server that keeps the agent loop alive, persists state between sessions, and triggers actions based on schedules or events. Managed Agents handle all of this at the API level.

You define the agent's tools, instructions, and triggers. Google runs the agent loop server-side, 24/7. The agent can take proactive actions on your behalf without an active client connection. State persists across sessions automatically.

Anthropic's parallel move tells the same story. Their agents for financial services announcement (May 5) and the new enterprise AI services company with Blackstone and Goldman Sachs (May 4) signal that hosted, always-on agent infrastructure is becoming the default deployment model for enterprise AI.

Why this matters for production. Managed agents eliminate the "agent hosting" problem. No more keeping WebSocket connections alive, no more custom state serialization, no more building your own scheduler. The provider handles uptime, and you handle the business logic.

The tradeoff. Provider lock-in increases when your agent's state lives on their servers. The mitigation: route your agent's LLM calls through a gateway so you maintain portability on the intelligence layer, even if the orchestration layer is managed. Define your tools via MCP so they work with any host.


3. Compiling agentic workflows into model weights

The paper Compiling Agentic Workflows into LLM Weights demonstrates something remarkable: you can distill a multi-step agent pipeline (planner, researcher, writer, reviewer) into a single fine-tuned model that produces equivalent output in one forward pass.

The technique works by running your agent workflow thousands of times, collecting the input-output pairs, and fine-tuning a smaller model on those pairs. The result: near-frontier quality at two orders of magnitude less cost.

A multi-agent research workflow that costs $0.50 per execution (four LLM calls, tool use, iteration) becomes a single inference call at $0.005. Latency drops from 30 seconds to 2 seconds.

Why this matters for production. Not every agent task requires full autonomy. Many agent workflows stabilize into predictable patterns after a few weeks of operation. Once you know the pattern, you can compile it. Keep the full agent pipeline for novel situations and edge cases, route the compiled model for the 80% of requests that follow the common pattern.

How to implement. Collect your agent's traces over time using Requesty's live logs. Identify the most common input-output patterns. Fine-tune a smaller model on those traces. Set up a routing policy that sends simple requests to the compiled model and complex requests to the full agent pipeline. Your costs drop dramatically while quality stays constant.


4. Speculative planning during idle time

IdleSpec (Exploiting Idle Time via Speculative Planning for LLM Agents) solves one of the most frustrating problems in agent UX: the wait time between steps.

When an agent calls a tool (API request, file read, database query), there is dead time while it waits for the response. IdleSpec uses this dead time productively. While waiting for tool results, the agent speculatively generates multiple candidate next-actions for each likely tool outcome.

When the actual result arrives, the agent matches it against its pre-computed candidates and immediately continues execution. If the speculation was correct (and the paper shows it is 60-80% of the time), the user perceives zero latency between steps.

Why this matters for production. Agent latency is the number one complaint from users. A five-step agent workflow with 2-second tool calls takes 10+ seconds even if the LLM responds instantly. IdleSpec cuts perceived latency by more than half for multi-step workflows.

How to implement. This technique works best when you can predict likely tool outcomes. For coding agents, file reads almost always succeed. For API agents, most calls return 200. Start speculative planning for your highest-frequency tool calls. Route the speculative planning calls through a fast, cheap model (Haiku or GPT-4o-mini via Requesty) while keeping the main agent on a reasoning model.


5. Latent communication guards for multi-agent safety

As multi-agent systems move to production, a new class of safety problems emerges. When agents share context through KV-cache sharing or message passing, one compromised or hallucinating agent can corrupt the entire system.

LCGuard (Latent Communication Guard for Safe KV Sharing in Multi-Agent Systems) introduces a validation layer that sits between agents in a multi-agent pipeline. Before one agent's output enters another agent's context, LCGuard checks it against safety constraints without adding latency to the critical path.

A companion paper, ExComm (Exploration-Stage Communication for Error-Resilient Agentic Test-Time Scaling), tackles the same problem from a different angle: improving how agents communicate during exploration phases so errors in one agent's reasoning do not cascade to others.

Why this matters for production. If you run a coding agent that delegates sub-tasks to specialist agents (a security reviewer, a test writer, a documentation agent), a hallucination in one specialist can propagate bad information to the others. LCGuard-style validation prevents this cascade without slowing down the happy path.

How to implement. Add a lightweight validation step between agent handoffs in your multi-agent workflows. Use a fast model to check that the output of one agent is internally consistent before passing it to the next. With Requesty, you can route these validation calls to Haiku-class models at minimal cost while keeping your primary agents on frontier models.


Putting it all together

These five techniques are not isolated advances. They compose into a new generation of agent architectures:

  1. Start with a full agent pipeline using your preferred SDK (Claude Agent SDK, OpenAI Agents SDK, Google ADK).
  2. Add self-evolution so the agent improves from its own failures without manual intervention.
  3. Use speculative planning to cut perceived latency in half.
  4. Deploy LCGuard-style validation between agent handoffs to prevent error cascades.
  5. Compile stable workflows into lightweight models for the common cases, keeping the full pipeline for edge cases only.

The common thread: a gateway that lets you route different parts of the agent workflow to different models and providers based on the task. Self-improvement steps need high reasoning. Speculative planning needs speed and low cost. Validation needs reliability. Production execution needs the best balance of quality and latency.

Requesty gives you one API key to orchestrate all of this, with unified cost tracking across 300+ models, automatic failover when a provider goes down, and live logs that capture every step for debugging and trace collection.

The agentic era is not coming. It shipped this week.

Frequently asked questions

What are self-evolving AI agents?
Self-evolving agents are systems that modify their own source code to improve performance over time. MOSS (May 2026) introduced source-level rewriting where an agent identifies weaknesses in its own logic and rewrites specific modules, then validates the changes through automated testing before deploying the improved version.
What are Google's Managed Agents in the Gemini API?
Managed Agents are a new infrastructure-level feature in the Gemini API announced at Google I/O 2026. Instead of orchestrating agents client-side, Google handles agent lifecycle management, state persistence, and proactive execution server-side. Agents can run 24/7 without a client connection and take actions on your behalf.
How does workflow compilation reduce AI agent costs?
Workflow compilation distills a multi-step agent pipeline (which normally requires multiple LLM calls) into fine-tuned model weights that produce the same output in a single forward pass. Research from May 2026 shows this achieves near-frontier quality at two orders of magnitude less cost.
What is speculative planning for AI agents?
Speculative planning (IdleSpec) is a technique where agents use the idle time between tool calls to speculatively pre-plan their next steps. While waiting for an API response or file operation, the agent generates multiple candidate next-actions, so it can respond instantly when the tool result arrives.
How does an AI gateway help with these new agent techniques?
A gateway like Requesty lets you combine these techniques across providers. Use self-evolving agents on Claude for code tasks, managed agents on Gemini for background monitoring, and compiled workflows for high-volume repetitive tasks. One API key gives you access to all of them with unified cost tracking and failover.
Related reading