Requesty
Back|JUN '26AGENTS / BEST PRACTICES
7 MIN READ|

Loop Engineering: How to Build AI Agent Loops That Run Themselves

Thibault Jaigu
Thibault Jaigu
CEO & Co-Founder
Published

The biggest shift in AI agent development in June 2026 is not a new model. It is a new way of using them. Loop engineering has become the dominant pattern for production agent workflows, replacing the manual prompt, wait, review cycle with autonomous systems that run themselves.

The core idea: you do not prompt the agent. You design the system that prompts the agent. The agent iterates until done, reports findings, and waits for the next trigger.

Why loops matter now

Three capabilities converged in the past sixty days to make loops practical:

Models handle long tasks. METR benchmarks show Claude Opus 4.6 completing 50% of tasks that take 12 hours. A year ago, Opus 4 topped out at 1 hour 40 minutes. The ceiling moved 6x.

Loops are built in. Claude Code shipped /loop, cron scheduling, and dynamic workflows. Codex shipped the Automations tab with recurring schedules and subagent spawning. You no longer need custom infrastructure. If you are choosing between the tools, our agentic coding tools comparison breaks down Claude Code, Cursor, Codex, and Aider side by side.

Subagents prevent degradation. The main loop spins up isolated subagents with fresh context windows. Each subagent does focused work and reports back. The loop controller never fills its own context. We cover the patterns in depth in multi-agent orchestration patterns that work in production.

The four loop types

Every production loop fits one of four patterns:

Heartbeat loops

Run continuously on a short interval (seconds to minutes). Use for monitoring: watch logs, check service health, scan for drift.

YAML
# Claude Code heartbeat example
schedule: "*/5 * * * *"  # every 5 minutes
prompt: "Check staging error logs. If error rate > 1%, open an issue."
stop_condition: never  # runs indefinitely

Cron loops

Scheduled at specific times. Use for batch work: daily code review, weekly dependency audits, morning standup summaries.

YAML
# Codex automation example
schedule: "0 10 * * 1-5"  # weekdays at 10am
prompt: "Review all PRs older than 3 days. For each, summarize blockers and ping the author."
model: gpt-5.5
subagents: true

Hook loops

Triggered by external events. A PR is pushed, CI fails, a Slack message arrives. The loop runs once per trigger.

YAML
# Claude Code hook
trigger: "post-push"
prompt: "Run the test suite. If any test fails, attempt a fix. If the fix passes, commit it. If not, open an issue with the failure details."

Goal loops

Iterate until a success condition is met, then stop. Use for refactoring, bug hunting, or migration tasks where the scope is unknown upfront.

YAML
# Goal loop: migrate all files
prompt: "Find the next file using the old API pattern. Migrate it to the new pattern. Run tests."
stop_condition: "No files match the old pattern"
max_iterations: 200

Anatomy of a production loop

Every effective loop needs five components:

1. Worktrees. Each iteration runs in an isolated git worktree. If the agent breaks something, it breaks a copy, not your main branch. Claude Code and Codex both support this natively.

2. Skills. Reusable instruction sets the loop can invoke. Instead of pasting a wall of instructions into a schedule, you reference a skill file that stays maintainable and version-controlled.

3. Connectors (MCP). The Model Context Protocol gives loops access to external tools: databases, issue trackers, deployment systems, monitoring dashboards. One protocol, thousands of integrations.

4. Subagents. The loop controller decomposes work and delegates to specialized subagents. A security reviewer subagent uses a strong model on high reasoning effort. A file scanner uses a fast, cheap model. Each subagent has its own context window and tool permissions.

5. State tracking. Loops need to know what they have done. File-based state (a JSON checkpoint), git history, or an external database prevents redundant work across iterations.

For the full architecture behind these components, see our guide to building production AI agents in 2026 and the deeper survey of self-evolving, managed, and compiled agent techniques.

How routing cuts loop costs by 60 to 80 percent

Agent loops are token-intensive. A daily PR review loop that spawns 5 subagents, each reading 50K tokens of context, costs real money at frontier model pricing.

The fix: route each step to the right model tier.

Loop stepModel tierCost per 1M tokens
File scanning and classificationNano (GPT-5.4-nano, Gemini Flash)$0.10 to $0.30
Summarization and draftingMid-tier (Sonnet 4.6, GPT-5.4)$1 to $3
Final review and decisionFrontier (Opus 4.8, GPT-5.5)$10 to $15

With Requesty, you configure this routing once in a policy:

YAML
# requesty routing policy for agent loops
policy:
  - match: "classify|scan|filter"
    model: google/gemini-2.5-flash
  - match: "draft|summarize|write"
    model: anthropic/claude-sonnet-4.6
  - match: "review|decide|architect"
    model: anthropic/claude-opus-4.8
  fallback:
    - anthropic/claude-sonnet-4.6
    - openai/gpt-5.4

Add prompt caching (90% reduction on repeated system prompts and tool definitions) and the math gets dramatic. A loop that would cost $50/day at frontier pricing drops to $8 to $12/day with routing and caching combined. We walk through the caching numbers in how prompt caching cuts costs by up to 90%, and the routing logic in how to route LLM requests by cost and latency.

Building your first loop: a daily PR reviewer

Here is a concrete example. This loop runs every morning at 10:15, reviews all PRs older than 3 days, and pings authors with actionable feedback.

In Claude Code:

Shell
claude code --schedule "15 10 * * 1-5" \
  --skill pr-review \
  --prompt "Find all open PRs older than 3 days in this repo. For each PR, spawn a subagent to review the diff and write a summary of blockers. Post the summary as a PR comment and tag the author."

In Codex, create an Automation in the Automations tab:

  • Project: your repo
  • Schedule: Weekdays at 10:15am
  • Prompt: Same as above
  • Subagents: Enabled
  • Model: gpt-5.5

Both tools archive runs that find nothing and surface runs that produce findings in a triage inbox.

Common loop failures and how to avoid them

Token runaway. A goal loop with no max_iterations can burn through $500 in an hour. Always set a ceiling. Start with 50 iterations and increase once you have cost data.

Context rot. Long-lived loops that keep appending to the same context window degrade in quality. The fix: subagents with fresh context for each iteration, or compaction to summarize and reset.

Overconfident termination. The agent declares "done" when it has only checked half the codebase. Add verification steps: a second agent that checks the first agent's work, or a hard condition (zero test failures, zero lint errors) rather than a soft judgment. Reliability practices like this are the focus of our AI agent reliability guide.

State amnesia. The loop forgets what it already processed. Write state to a file or database after each iteration. On restart, read the checkpoint and skip completed items.

The gateway layer ties it together

Loops make 10x to 100x more API calls than a chatbot. That volume means:

  • Failover matters. If one provider goes down mid-loop, the loop should not crash. A gateway automatically retries on an alternate provider.
  • Cost tracking is essential. You need per-loop, per-subagent, and per-model cost breakdowns to optimize spend.
  • Caching compounds. Repeated system prompts and tool definitions across loop iterations get cached, saving 40 to 90 percent on input tokens.
  • Rate limits disappear. A gateway distributes requests across providers so no single provider rate-limits your loop. See bypassing Claude rate limits with Requesty.

Requesty handles all four. One API key, one base URL, 300+ models. Your loop code stays clean, and the infrastructure handles reliability, cost, and observability underneath. This is exactly the role we describe in why your LLM gateway is the backbone of production agents.

Getting started

  1. Pick one recurring task your team does manually (PR reviews, dependency updates, log monitoring).
  2. Write it as a prompt with a clear stop condition.
  3. Choose your loop type (cron for scheduled, hook for event-driven, goal for open-ended).
  4. Route through Requesty to get cost tracking and failover from day one.
  5. Start with a low max_iterations limit. Watch the costs. Tune the routing policy.

The shift from "prompt the agent" to "design the loop" is the single biggest force multiplier for engineering teams in 2026. The models are capable enough. The tools are ready. The question is whether you are still doing the prompting yourself, or whether you have built the system that does it for you.

Frequently asked questions

What is loop engineering for AI agents?
Loop engineering is the practice of replacing yourself as the person who prompts an AI agent. Instead of manually starting each task, you design a system where the agent prompts itself on a schedule or trigger, iterates until a stop condition is met, and delivers findings autonomously. Claude Code and OpenAI Codex both support loops natively in 2026.
What are the four types of AI agent loops?
The four loop types are: heartbeat loops (run continuously on a short interval), cron loops (scheduled at specific times like daily at 10am), hook loops (triggered by events like a PR push or CI failure), and goal loops (iterate until a success condition is met, then stop). Each fits different workflows.
How do subagents work inside agent loops?
Subagents separate the loop controller from the workers. The main loop agent decomposes a task, spawns specialized subagents in isolated worktrees, collects their results, and decides whether to continue or stop. This prevents context window degradation and lets each subagent use the optimal model for its subtask.
How does an LLM gateway reduce loop costs?
Agent loops make 10x to 100x more LLM calls than single-shot prompts. A gateway like Requesty routes each iteration to the cheapest capable model: nano models for classification steps, mid-tier for drafting, frontier for final review. Combined with prompt caching (90% input cost reduction on repeated prefixes), total loop costs drop 60 to 80 percent.
What tools support loop engineering in 2026?
Claude Code supports loops via the /loop command, cron scheduling, hooks, and dynamic workflows with subagents. OpenAI Codex supports loops via the Automations tab with configurable schedules and subagent spawning. Both support skills, worktrees, and MCP integrations for tool access inside loops.
Related reading