An agentic system is the architectural step beyond "prompt → response." An Orchestrator LLM receives a task, plans subtasks, dispatches them through a message queue to specialist agents (Researcher, Coder, Reviewer), each with tools and shared memory (Vector DB + Redis). A guardrail layer validates every output, retries on low confidence, and escalates to humans when stuck. The result — a self-healing team of AI agents that plans, executes, recovers, and ships work autonomously.
When a task requires more than one LLM call to complete, has steps that depend on each other, needs tools (web search, code execution, API calls), and benefits from recovery loops (retry, escalate). Examples — research reports, code generation pipelines, customer support workflows that escalate, content production lines, autonomous testing, data pipelines that ask for human input on edge cases. Don't use it for one-shot Q&A or simple summarization — that's overkill and adds latency.
What changed when LLMs got tools
The original LLM API was stateless — text in, text out, one shot. Anything beyond a single response had to be hand-coded by you. That model is dying. The new model is the agent — an LLM in a loop, with memory, with tools, deciding what to do next based on what just happened.
This sounds more impressive than it is. An agent is just:
while not done:
plan = llm("here is the task and what's happened so far. what should we do next?")
result = execute(plan) # could be a tool call, sub-LLM call, or terminal answer
update_state(plan, result)
The architecture around that loop is what makes it production-grade.
1. The Orchestrator is the brain
The Orchestrator is one LLM (usually a high-quality model — GPT-4, Claude Opus) with three responsibilities:
- Decompose the user task into subtasks
- Dispatch subtasks to specialist agents
- Monitor progress, decide when a subtask is done, when to retry, when to escalate
A canonical orchestrator prompt:
You are coordinating a team of specialist agents. The user wants:
{task}
Team:
- Researcher: gathers facts via web search and document retrieval
- Coder: writes and runs code
- Reviewer: validates outputs against requirements
Decompose the task. Output a plan as JSON:
{ "steps": [{ "agent": "Researcher", "subtask": "..." }, ...] }
2. Specialist agents — separation of concerns
Each specialist agent is a smaller LLM (often a cheap model — GPT-4o-mini, Haiku) prompted with one focused job:
| Agent | Tools | When to invoke |
|---|---|---|
| Researcher | Web search, vector DB query, file read | ”Find facts on X” |
| Coder | Code interpreter, file system, git | ”Write code that does Y” |
| Reviewer | Diff, test runner, lint | ”Validate that the output meets Z” |
| Planner | Sub-orchestration | Recursive breakdown of complex steps |
| Critic | Self-critique, fact-checking | Pre-exit quality gate |
3. Message queue — the connective tissue
Specialist agents don’t call each other directly. They communicate through a message queue — Kafka, RabbitMQ, or even a simple Redis stream.
With a queue:
- Tasks are durable — survive crashes
- Work is parallelized — multiple agents can run simultaneously
- Retries are free — re-publish on failure
- Audit trail comes free — every message is logged
A typical message:
{
"task_id": "task-abc123",
"from": "orchestrator",
"to": "researcher",
"subtask": "Find the top 3 patents related to vector databases",
"context_refs": ["vec://memory/task-abc123"],
"deadline": "2026-05-05T16:00:00Z",
"retries_remaining": 3
}
4. Memory — what the agents share
Agents are not stateless by accident. Memory is the most important architectural decision in an agentic system.
Two stores, two purposes:
Vector DB (long-lived semantic memory). Research findings, prior decisions, code snippets, summaries. Anything an agent might want to recall later via similarity search. Each agent reads, the orchestrator (or specifically a “Memorizer” agent) writes.
Redis (short-lived task state). The current state of each in-flight task — current step, partial results, lock holders. Faster than the vector DB, structured, transactional. Cleared when the task completes.
5. Tools — how agents act on the world
A tool is a function the agent can call to read or change external state. Tools are how you cross the boundary from “language” to “action.” See the Function Calling article for the full mechanism — in agentic systems, every tool call is the agent’s choice, not the developer’s.
Common tool kits:
- Read — web_search, vector_query, http_get, file_read, db_query
- Reason — code_interpreter (sandboxed Python)
- Act — send_email, create_record, post_message, run_test
6. The hallucination guard
This is the most underrated piece of the system. Every output that crosses an agent boundary — especially outputs that leave the system to a user or external API — runs through a critic.
Three checks:
- Self-critique — run a second LLM call: “Here is the proposed answer. Does it actually answer the question? Is it grounded in the retrieved context? Are there factual claims that need verification? Score 1–10.”
- Faithfulness — for RAG-grounded outputs, verify each claim traces to a retrieved chunk
- Confidence threshold — if either check returns < 7/10, retry with a stricter prompt, or escalate
7. Failure recovery — graceful degradation
Things go wrong constantly:
- An LLM call returns garbage
- A tool times out
- An agent gets stuck in a loop
- A subtask exceeds budget
- An external API rate-limits
The architecture survives by treating failure as data:
def run_subtask(subtask):
for attempt in range(3):
try:
result = agent.execute(subtask, timeout=60)
if critic.passes(result):
return result
except TimeoutError:
subtask = simplify(subtask)
except LowConfidence:
subtask = clarify(subtask)
return escalate_to_human(subtask)
8. The control loop — putting it together
flowchart TD
U[User Task] --> O[Orchestrator<br/><i>decompose into plan</i>]
O --> Q{{Message Queue<br/>Kafka / Redis Streams}}
Q --> R[Researcher<br/>web · vector · files]
Q --> C[Coder<br/>sandbox · git · tests]
Q --> RV[Reviewer<br/>diff · lint · validate]
VDB[(Vector DB<br/>long-lived memory)]
RD[(Redis<br/>task state)]
R <-->|read/write| VDB
C <-->|read/write| VDB
RV <-->|read/write| VDB
R <-->|state| RD
C <-->|state| RD
RV <-->|state| RD
R --> Crit[Critic<br/>self-critique · faithfulness · confidence]
C --> Crit
RV --> Crit
Crit -->|low conf| Q
Crit -->|escalate| H[Human-in-the-loop]
Crit -->|pass| Agg[Aggregator<br/>synthesise final answer]
Agg --> Resp[Response to user]
style U fill:#1c2333,stroke:#475569,color:#e7eaf1
style O fill:#0e7490,stroke:#06b6d4,color:#fff
style Q fill:#9a3412,stroke:#f97316,color:#fff
style R fill:#1e3a8a,stroke:#3b82f6,color:#fff
style C fill:#581c87,stroke:#a855f7,color:#fff
style RV fill:#365314,stroke:#84cc16,color:#fff
style VDB fill:#0f1320,stroke:#475569,color:#cdd3df
style RD fill:#0f1320,stroke:#475569,color:#cdd3df
style Crit fill:#7e1d1d,stroke:#ef4444,color:#fff
style H fill:#9a3412,stroke:#f97316,color:#fff
style Agg fill:#1e3a8a,stroke:#3b82f6,color:#fff
style Resp fill:#365314,stroke:#84cc16,color:#fff
Every arrow can fail; every arrow has a retry, a timeout, and an escalation path.
9. What “production-grade” actually means
A weekend agentic demo runs three LLM calls and prints the result. A production agentic system has:
- Step budget — max iterations, kill switch
- Cost budget — per-task, per-user, hard cap
- Latency budget — total time before timeout
- Memory bound — context windows can’t grow forever
- Audit log — every LLM call, tool call, decision, recorded
- Replay — you can re-run any task from the audit log
- Observability — dashboards on success rate, cost per task, tool usage, hallucination rate
- Human-in-the-loop — clear escalation triggers and UI for human review
10. Where agents are real (and where they’re hype)
Real today:
- Code generation pipelines (Cursor’s agents, Devin, Aider)
- Customer support escalation (Decagon, Sierra)
- Research and analysis (Perplexity Pro, OpenAI Deep Research)
- Software testing (autonomous QA bots)
- Knowledge worker assistants (Glean, Notion AI)
Hype today:
- Fully autonomous “personal agents that book your flights and manage your life” (the trust and safety surface is too big)
- Agents that learn permanently from each interaction (memory updates without curation are a liability)
- “AGI through agentic loops” (no, agents are software architecture, not consciousness)
Comments 0
Discuss this page. Markdown supported. Be kind.