Agent Loop

Also known as: ReAct loop, perception-action loop, agent harness

TL;DR

The agent loop is the execution scaffold that wraps an LLM into an agent: perceive → think → act → observe → repeat. It's the trajectory primitive.

The agent loop is the runtime harness that turns a stateless LLM into an iterating decision-maker. Structurally a 30-line while-loop — call the model, parse the output, execute any tools requested, append results to the context, repeat — it’s where almost every production agent failure either happens or gets caught.

The canonical shape

The textbook ReAct-style loop has four phases per iteration:

Perceive — assemble the context: system prompt, conversation history, recent tool results, retrieved documents, scratch-pad memory.
Think — call the LLM. The model produces either a final answer or a tool-call request, often with intermediate reasoning ( chain-of-thought ) explaining its choice.
Act — if a tool was requested, execute it. Validate arguments, run the underlying function, capture the result (or the error).
Observe — append the tool result to the context and decide whether to loop or terminate.

In production, that loop is the surface where every operational concern lives.

What lives in the loop

Loop responsibilities

Step budgets. Max iterations per task (typically 10-25). Without this, a confused agent will retry forever.
Token budgets. Max cumulative prompt tokens. Without this, runaway loops blow your inference bill.
Context management. Context compression between iterations — the raw transcript grows linearly; the model’s effective attention does not. Compress old turns into summaries before they become noise.
Tool dispatch. Argument validation, retries on transient failures, structured error reporting back to the model.
Stop conditions. Explicit “done” signals from the model, success-criterion checks against external state, confidence thresholds, hard ceilings.
Tracing. Every step’s inputs, outputs, latency, and cost — for debugging and offline replay.

What goes wrong

Unbounded looping. The model never decides it’s done. Always wire an external cap.
Context explosion. Tool results accumulate; by step 8 the context is 50K tokens of redundant blob and the model loses the thread. Compress aggressively.
Tool errors that the model can’t recover from. The model retries the same broken call three times. Surface error messages clearly; consider an explicit “you’ve now failed this tool 3× — try a different approach” injection.
Stale state. The agent makes a decision based on a tool result from 5 turns ago that’s now invalid. Either re-fetch or annotate freshness.

The instinct is “summarize old turns to free up context budget”, but naive summarization usually loses the load-bearing detail. Later decisions depend on specifics — exact identifiers, exact error messages, exact tool arguments — that summaries elide as “the agent looked up the user”. When the loop needs that user later, the identifier is gone and the model either hallucinates one or fails.

Production-grade compression keeps the artifacts (raw tool results, structured outputs) addressable by reference and only summarizes the narrative (which decisions the agent made and why). The model fetches the artifact by ID rather than re-deriving it from a lossy summary. This is closer to how humans take meeting notes than how naive summarization works.

The deeper point: context compression is a retrieval problem on the agent’s own history. The same considerations that govern RAG — what’s discoverable, what’s calibrated, what gets reranked — apply here, with the agent itself as both the indexer and the consumer.

Three layers. First, step traces: every iteration logs the inputs (full context), outputs (tool requests and reasoning), tool results, latency, and token spend. Without this you can’t debug a single failed run.

Second, trajectory replay: re-run a saved trajectory against a different prompt, model, or toolset. Replaying yesterday’s 1000 production trajectories against your candidate change tells you whether it would have helped, hurt, or done nothing. The replay infrastructure is the single highest-leverage thing to build once you have an agent in production.

Third, trajectory clustering: trajectories binned by intent, failure mode, and tool path surface the long tail — queries that take 15 steps, ones that exhaust the budget, ones that succeed but cost 50× the median. Without aggregation you only see individual failures, not the shape of the problem.

The trajectory primitive

The agent loop produces a trajectory: an ordered sequence of (state, decision, action, observation) tuples. This trajectory is the primitive everything downstream operates on — debugging, evals, replay, reflection , knowledge distillation into smaller specialized models. Treat it as a first-class artifact: store it, version it, replay it.

Go further

How is this different from a chat loop?

A chat loop alternates user and model turns. An agent loop has the model in the driver's seat across multiple turns without user input — observing tool results, deciding next steps, terminating only when the goal is met or a budget is exhausted. The user is one possible source of observation, not the cadence-setter.

Agent ReAct prompting

What stop conditions actually work?

Budget caps (max steps, max tokens, max wall-clock), explicit 'done' tool calls from the model, success-criterion checks against external state (test passed, file written, ticket closed), and confidence thresholds on a calibrated grounding signal. Always wire at least two — never trust the model alone to halt.

Reflection and critique Score calibration

What does production observability for an agent loop look like?

Trace every step: input context, model output, tool calls, tool results, latency, token spend. Replay traces against new prompts or models to A/B test changes. Without traces you can't tell whether a degraded answer came from worse retrieval, a tool failure, or a model regression.

Agent Hallucination

← All concepts

The best AI teams build with ZeroEntropy models

Book Demo View docs