Sub-agent
In one sentence
A sub-agent is a fresh, isolated AI session spawned by a parent agent to do a specific delegated task — it has its own context, its own scratchpad, runs in parallel, returns a result, and then disappears.
Why sub-agents exist
A single AI conversation has a fixed context window — the model can only “see” so many tokens at once. If you keep piling work into one conversation, two things go wrong:
- The context gets polluted. Old messages crowd out the room needed for new reasoning.
- Costs scale badly. Every turn re-processes the whole conversation. By turn 50, every new question is paying for a recap of all 49 previous ones.
The standard fix in software engineering — delegation — applies here too. Instead of doing five things in one giant conversation, the parent agent spawns five focused sub-agents, each with a clean slate, each given just enough context to do its job, each returning a finished result.
Anthropic, OpenAI, and most modern agent frameworks now support this pattern. The terminology varies (sub-agents, child sessions, workers, tools-of-tools), but the shape is the same.
What it actually does — concretely
When the parent agent decides to delegate:
- Spawns a new session with its own ID and its own clean context.
- Hands it a task description and (optionally) some context to fork from.
- Continues with its own work, or yields and waits.
- Receives a completion event when the sub-agent finishes.
- Folds the result into the parent conversation as a new message.
Crucially, only the final result comes back into the parent context — not the entire reasoning chain. So a sub-agent that needed 20 internal turns to figure something out returns only its summary, keeping the parent’s context lean.
Working example from this machine (May 2, 2026, 06:54 EDT)
This morning, after restoring write scope to the gateway, I tested the sub-agent path with a dead-simple ping:
sessions_spawn(
task: "Run uname -m && date && echo 'M5 Max sub-agent spawn test successful'",
mode: "run"
)
The gateway accepted the request and gave back a child session key. I yielded my turn (told the parent session “I’m done for now, wake me when the sub-agent finishes”). Sixteen seconds later, a completion event arrived in the parent session:
arm64
Sat May 2 06:54:47 EDT 2026
M5 Max sub-agent spawn test successful
Stats: runtime 16s • tokens 139 (in 7 / out 132)
Three things to notice:
- Total token cost: 139. Trivial.
- Runtime: 16 seconds. Fast enough for real workflows.
- The parent session’s context never had to load the sub-agent’s working memory. Only the eight-line result.
Where sub-agents are actually useful
In daily work on this machine, sub-agents are routinely used for:
- Reading large files so the parent doesn’t have to ingest 5,000 lines just to extract a summary.
- Running multi-step research (“read this article, summarize, cross-reference our notes, propose action items”) — the messy intermediate work stays in the child.
- Parallel exploration — spawning three sub-agents to investigate three angles at once.
- Long-running tasks that would otherwise hold up the parent conversation.
- Isolation for risky work — a sub-agent doing a destructive operation runs in its own sandbox.
The general rule: if a task has a clear input and a clear output, and the messy middle does not need to live in the parent’s memory, delegate it.
Sub-agents vs. tools — a common confusion
Both extend an agent’s reach, but they differ structurally:
| Tool | Sub-agent | |
|---|---|---|
| What it is | A function call (e.g., “search the web”, “read this file”) | A whole new AI session |
| Returns | Raw data | A finished, reasoned result |
| Reasoning capacity | None — it just executes | Full LLM reasoning |
| Cost per call | Usually free or cheap | Real model tokens |
| Best for | Mechanical operations | Tasks that need judgment |
A sub-agent can use tools internally. So a parent might say “spawn a sub-agent to summarize this directory” and the sub-agent then uses the file-read tool, the grep tool, etc. as part of its reasoning before reporting back.
Why this matters in a teaching context
The sub-agent pattern is how agentic systems scale beyond a single brain. It is the AI-system equivalent of a manager who can hire temporary contractors for specific projects.
For a BBA or MBA classroom, a productive comparison is to organizational design:
- Sole proprietor = a chatbot. One person, no help, every task in the same head.
- Founder with junior staff = an agent with sub-agents. Delegation, focused effort, less context-thrash.
- Department with cross-functional teams = an agent with parallel sub-agents on coordinated tasks. Faster, more expensive, more coordination overhead.
The skills required to design good sub-agent workflows — clean task decomposition, well-bounded delegation, useful summaries from delegates — are exactly the skills good managers already practice on their human teams.
Trade-offs
- Coordination overhead. Spawning, waiting, parsing the result — every delegation has friction. Trivial tasks should not be delegated.
- Loss of nuance. The parent only sees the summary, so any subtlety in the sub-agent’s internal reasoning is lost unless explicitly surfaced. (This is the same problem human managers have with their reports.)
- Cost stacking. Each sub-agent uses model tokens. Spawn ten and you have ten model bills. Worth using cheaper models for the routine sub-agents — see (planned).
- Debugging is harder. When a parent agent gets a wrong answer, you sometimes need to inspect the sub-agent’s internal session to figure out where the reasoning broke.
Related entries: gateway.md (the process that spawns sub-agents), tool.md, *(planned).*