On February 5, 2026, Anthropic launched Agent Teams as a research preview in Claude Code alongside Opus 4.6. The feature represents a fundamental shift in how AI agents collaborate: instead of a single orchestrator delegating to isolated sub-agents that can only report back, agents can now communicate peer-to-peer, self-claim tasks from a shared pool, and coordinate directly with each other.
This post examines what Agent Teams actually are, how they compare to the traditional sub-agent model, and where they fit in the broader landscape of multi-agent architectures. Rather than declaring a winner, the goal is to build a clear mental model for when each approach makes sense.
The Old Model: Sub-Agents as Isolated Workers
Before Agent Teams, Claude Code used a hub-and-spoke pattern. The main session (the "hub") could
spawn sub-agents using the Task tool. Each sub-agent operated in its own context window, executed a
focused task, and returned results exclusively to the calling agent. Sub-agents could not talk to each other.
// The old model: Task tool spawns isolated sub-agents
// Each sub-agent returns results ONLY to the parent
Main Agent
├── Task("Search for auth files") → returns file list
├── Task("Analyze test coverage") → returns coverage report
└── Task("Check for security issues") → returns vulnerability list
// Sub-agents cannot talk to each other
// All coordination flows through the main agent
// Each sub-agent operates in its own context window This model works well for isolated, parallelizable tasks where only the final result matters. Need to search three different directories simultaneously? Spawn three sub-agents. Need a code review and a test analysis? Two sub-agents, results merged by the parent.
But the model breaks down when sub-agents need awareness of each other's work. Consider building a full-stack feature: a frontend agent creates a component expecting a certain API shape, while a backend agent independently designs a different response format. Neither agent can see the other's decisions. The orchestrator only discovers the mismatch after both finish, leading to rework.
This is what Cognition AI (the team behind Devin) calls the "Flappy Bird problem": two agents tasked with building parts of a game independently produce a Super Mario background and a mismatched bird sprite. Without real-time visibility into each other's work, coherence is impossible.
Agent Teams: Peer-to-Peer Collaboration
Agent Teams introduces four core primitives that transform the architecture from hub-and-spoke to peer-to-peer:
| Component | Role |
|---|---|
| Team Lead | Creates the team, spawns teammates, assigns initial tasks, synthesizes results |
| Teammates | Fully independent Claude Code sessions that work on assigned tasks |
| Shared Task List | Persistent, disk-based task pool with dependency tracking and self-claiming |
| Mailbox System | JSON-based inboxes enabling direct agent-to-agent messaging |
The feature is currently gated behind an environment variable:
// Enable Agent Teams (research preview)
// settings.json
{
"env": {
"CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1"
}
} On-Disk Coordination
Unlike in-memory orchestration, Agent Teams coordinate entirely through the filesystem. Tasks are numbered JSON files. Messages are appended to inbox JSON files. File locking prevents race conditions when multiple teammates try to claim the same task.
# On-disk structure for Agent Teams
~/.claude/
├── teams/
│ └── my-feature-team/
│ ├── config.json # Team members, roles, metadata
│ └── inboxes/
│ ├── lead.json # Lead agent's message inbox
│ ├── frontend.json # Frontend agent's inbox
│ ├── backend.json # Backend agent's inbox
│ └── tests.json # Test agent's inbox
└── tasks/
└── my-feature-team/
├── 001.json # { status: "completed", owner: "frontend" }
├── 002.json # { status: "in_progress", owner: "backend" }
├── 003.json # { status: "pending", blockedBy: ["002"] }
└── 004.json # { status: "pending", owner: null } This design has a critical implication: there is no shared memory. Agents cannot see each other's context windows. The only coordination channels are task files and messages. This is both a feature (isolation prevents cascading failures) and a limitation (agents must explicitly communicate everything they want others to know).
The Messaging System
The SendMessage tool supports structured message types that enable rich coordination patterns:
// Agent Teams message types via SendMessage tool
{
// Direct message to a specific teammate
"type": "message",
"to": "backend-agent",
"content": "The API schema changed -- endpoint now returns { data, meta }"
}
{
// Broadcast to all teammates (expensive -- scales with team size)
"type": "broadcast",
"content": "Shared types updated in src/types/api.ts -- pull before editing"
}
{
// Task completion notification (auto-sent)
"type": "task_completed",
"taskId": "002",
"summary": "REST endpoints implemented, OpenAPI spec at docs/api.yaml"
}
{
// Plan approval gate -- teammate pauses until lead approves
"type": "plan_approval_request",
"plan": "Proposing to refactor auth middleware to support JWT + OAuth2"
}
{
// Idle notification (auto-sent when teammate finishes all tasks)
"type": "idle_notification",
"completedTasks": ["001", "003"]
} Two mechanisms stand out. First, plan approval gates: a teammate can be forced to work in read-only plan mode until the lead explicitly approves its approach. This prevents agents from charging ahead with a bad plan. Second, idle notifications: when a teammate finishes all its tasks, it automatically alerts the lead, who can then assign more work or shut down the team.
Quality Control Hooks
Agent Teams integrates with Claude Code's hook system for enforcement:
- TeammateIdle hook: Runs when a teammate is about to go idle. Exit code 2 sends feedback and keeps the teammate working (useful for preventing premature completion).
- TaskCompleted hook: Runs when a task is being marked complete. Exit code 2 rejects the completion and sends feedback (useful for enforcing quality gates).
There's also a delegate mode (activated with Shift+Tab) that restricts the lead to
coordination-only tools -- preventing a common failure mode where the lead starts implementing tasks itself instead
of waiting for its teammates.
The C Compiler Stress Test
Anthropic stress-tested Agent Teams by having 16 parallel agents build a Rust-based C compiler from scratch. The results, published in an engineering blog post:
- Nearly 2,000 Claude Code sessions over two weeks
- 2 billion input tokens, 140 million output tokens
- Cost: just under $20,000
- Result: a 100,000-line compiler that compiles Linux 6.9 on x86, ARM, and RISC-V, passes 99% of GCC torture tests, and can build PostgreSQL, Redis, FFmpeg, and Doom
Key lessons from the project: high-quality test suites are essential (agents will solve the wrong problem if the verifier isn't precise), file ownership must be carefully separated (two teammates editing the same file leads to overwrites), and extensive READMEs are critical for agent orientation since agents lack persistent memory across sessions.
Sub-Agents vs Agent Teams: Side-by-Side
| Aspect | Sub-Agents (Task Tool) | Agent Teams |
|---|---|---|
| Communication | Report back to parent only | Any agent can message any other |
| Lifetime | Short-lived, synchronous | Persistent, long-running sessions |
| Task coordination | Main agent manages all work | Shared task list with self-claiming |
| Context | Own window; result string returned | Own window; full independent session |
| Failure isolation | Sub-agent failure returns error to parent | Teammate failure may block dependent tasks |
| Token cost | Lower (~6-8x baseline for 3 agents) | Higher (~12-15x baseline for 3 agents) |
| Debuggability | Good (all flows through one point) | Harder (distributed message traces) |
| Best for | Focused tasks where only the result matters | Complex work requiring cross-agent discussion |
The Broader Architecture Landscape
Agent Teams and sub-agents are just two points on a spectrum of multi-agent architectures. Understanding the full landscape helps you pick the right tool for each problem.
1. Hub-and-Spoke (Supervisor)
# Hub-and-Spoke (Supervisor) Pattern
# All communication flows through a central orchestrator
class Orchestrator:
def __init__(self, agents: list[Agent]):
self.agents = {a.name: a for a in agents}
async def execute(self, task: str):
# 1. Decompose task
subtasks = await self.decompose(task)
# 2. Delegate to specialists
results = {}
for subtask in subtasks:
agent = self.route(subtask)
results[subtask.id] = await agent.execute(subtask)
# Each agent returns results ONLY to orchestrator
# 3. Synthesize
return await self.synthesize(results)
# Pros: Simple, debuggable, full visibility
# Cons: Bottleneck, context loss between agents, sequential
The most common pattern. A central orchestrator decomposes tasks, delegates to specialists, and synthesizes results.
All communication flows through the hub. This is what Claude Code's Task tool, LangGraph's
supervisor pattern,
and Microsoft's Semantic Kernel use by default.
When to use: Task decomposition is clear, sub-tasks are independent, and you need full visibility into the workflow. This is the right default for most agentic applications.
2. Peer-to-Peer (Agent Teams / A2A)
# Peer-to-Peer (Agent Teams) Pattern
# Agents communicate directly with each other
class Teammate:
def __init__(self, name: str, inbox: MessageQueue):
self.name = name
self.inbox = inbox
self.task_pool = SharedTaskList()
async def run(self):
while True:
# Self-claim available tasks
task = await self.task_pool.claim_next()
if not task:
await self.notify_idle()
break
# Execute with awareness of teammates
result = await self.execute(task)
# Notify teammates who depend on this work
for blocked_task in task.blocks:
owner = blocked_task.owner
if owner:
await self.send_message(owner, {
"type": "dependency_resolved",
"task": task.id,
"summary": result.summary
})
await self.task_pool.mark_completed(task.id)
# Pros: Rich collaboration, no bottleneck, parallel
# Cons: Coordination complexity, token explosion, non-deterministic Agents communicate directly, claim tasks from a shared pool, and coordinate without routing everything through a central point. Claude Code's Agent Teams is one implementation; Google's Agent2Agent Protocol (A2A) provides a standardized version for cross-vendor interoperability, backed by 50+ technology partners.
When to use: Cross-layer collaboration is essential, multiple perspectives improve quality, and you can afford the token premium.
3. Sequential Pipeline
# Sequential Pipeline Pattern
# Fixed-order assembly line -- each stage feeds the next
pipeline = [
Agent("parser", tools=["read_file", "parse_ast"]),
Agent("analyzer", tools=["lint", "type_check", "complexity"]),
Agent("reviewer", tools=["check_style", "find_bugs"]),
Agent("reporter", tools=["format_markdown", "write_file"]),
]
async def run_pipeline(input_data):
state = input_data
for agent in pipeline:
state = await agent.process(state)
# Each agent transforms state and passes it forward
# No backtracking -- if stage 3 finds issues,
# it cannot ask stage 1 to redo its work
return state
# Pros: Deterministic, easy to debug, clear data lineage
# Cons: No parallelism, no backtracking, one slow stage blocks all A fixed-order assembly line where each agent transforms data and passes it to the next. Google ADK and Microsoft's architecture catalog both formalize this pattern. It's the simplest multi-agent architecture and the easiest to debug.
When to use: Linear data transformation (parse → analyze → format), progressive refinement (draft → review → polish), or any workflow where stages have clear input/output contracts.
4. Blackboard Architecture
# Blackboard Architecture
# Agents collaborate via a shared knowledge base
class Blackboard:
def __init__(self):
self.state = {} # Shared knowledge base
self.subscribers = [] # Agents watching for changes
async def write(self, key: str, value: any, author: str):
self.state[key] = {
"value": value,
"author": author,
"timestamp": now()
}
# Notify agents interested in this key
for agent in self.subscribers:
if agent.watches(key):
await agent.on_update(key, value)
async def read(self, key: str) -> any:
return self.state.get(key, {}).get("value")
# Usage: Agents read/write to blackboard independently
# Agent A posts partial solution → Agent B refines it
# No direct agent-to-agent communication needed
# Pros: Decoupled agents, incremental problem-solving
# Cons: State management complexity, potential conflicts Originated in the 1980s for speech recognition and now being revived for LLM-based systems. Agents collaborate through a shared knowledge base without direct communication. Each agent reads from and writes to the blackboard independently, and a control unit determines which agent acts next. A 2025 arXiv paper explores this for LLM multi-agent systems with 9 specialized agents.
When to use: Problems where solutions emerge from accumulated partial contributions, asynchronous collaboration, and agents that don't need to know about each other.
5. Hierarchical Teams
A tree-like structure with multiple levels of supervision. Top-level supervisors manage team-level supervisors, who manage worker agents. LangGraph supports hierarchical agent teams natively.
When to use: Large-scale complex systems that mirror organizational structures. But beware: research from the Puppeteer framework found that static hierarchies waste resources on unnecessary branches, leading them to develop RL-learned orchestrators that dynamically skip low-value subtrees.
6. Mixture-of-Experts Delegation
Borrowed from the neural network architecture: a router mechanism directs inputs to the most appropriate specialist agent. Only the relevant "expert" is activated, making this the most token-efficient multi-agent pattern. Google ADK implements this as a coordinator agent managing specialist sub-agents with routing logic.
A more sophisticated variant, Mixture-of-Agents (MoA), uses layered LLM agents where each agent in a layer takes all outputs from the previous layer as auxiliary input, creating iteratively refined responses.
When to use: Routing to specialists based on input classification. Ideal when you have clearly delineated domains (legal vs. financial vs. technical queries) and want minimal token waste.
7. Swarm Intelligence
Inspired by biological swarms. Agents follow simple local rules, and complex global behavior emerges. OpenAI explored this with the Swarm framework (now succeeded by the OpenAI Agents SDK). The Swarms framework targets enterprise applications with 36,000+ GitHub stars.
However, a 2025 research paper found that LLM-powered swarms require roughly 300x more computation time than classical counterparts, questioning whether "LLM-powered swarms" are a genuine frontier or a conceptual stretch.
When to use: Exploration-heavy tasks where resilience matters more than efficiency. In practice, the token economics make true swarming prohibitively expensive for most applications.
The Numbers: Token Cost and Failure Rates
# Token consumption comparison (approximate)
Standard chat interaction: ~1x (baseline)
Single agent with tools: ~4x
Hub-and-spoke (3 sub-agents): ~6-8x (sub-agent contexts + orchestrator)
Agent Teams (3 teammates): ~12-15x (full sessions + messaging overhead)
Swarm (10+ agents): ~50x+ (each agent = full context window)
# Research finding: skill-based single-agent systems achieve
# similar accuracy to multi-agent counterparts while reducing
# token consumption by 54% and latency by 50% on average
# (arXiv: 2601.04748) The token economics are stark. Community-reported costs for Claude Code specifically:
| Approach | Typical Token Usage |
|---|---|
| Solo session | ~200k tokens |
| 3 sub-agents | ~440k tokens |
| 3-person team | ~800k tokens |
Agent Teams roughly double the cost of equivalent sub-agent setups because each teammate is a full, persistent Claude Code session (not a short-lived helper), and the messaging system adds per-message token overhead.
Failure Rates in Production
Research shows multi-agent LLM systems fail at 41-86.7% rates in production. A systematic study found that nearly 79% of failures come from two categories:
| Failure Category | % of Failures | Description |
|---|---|---|
| Specification Problems | 41.8% | Role ambiguity, unclear task definitions, missing constraints |
| Coordination Failures | 36.9% | Communication breakdowns, state sync issues, conflicting objectives |
| Verification Gaps | 21.3% | Inadequate testing, missing validation mechanisms |
The implication: most multi-agent failures are design problems, not infrastructure problems. Better task specifications and communication protocols matter more than better models.
The Single-Agent vs Multi-Agent Debate
This became the central debate in the AI agent community in mid-2025 when Cognition AI published "Don't Build Multi-Agents" and Anthropic released details of their multi-agent research system on the very next day.
Cognition's Argument
Walden Yan argued that multi-agent architectures are inherently fragile due to insufficient context sharing and conflicting implicit decisions. Parallel sub-agents make independent assumptions that create downstream inconsistencies. Their recommendation: single-threaded linear agents as the default, with a specialized compression model to summarize action histories for longer tasks.
Anthropic's Counter
Anthropic's multi-agent research system -- multiple Claude agents working in concert -- outperformed single-agent systems by over 90% on certain research tasks. Their position: for inherently parallel work (research, code review, exploration), multi-agent coordination provides measurable quality improvements.
The Synthesis
Despite the opposing headlines, both camps agree on the fundamental point: context management is the primary determinant of agent reliability. The real question isn't "single vs. multi" but rather "how do you maintain coherent context?" -- whether through a single agent's persistent memory or through carefully engineered inter-agent communication protocols.
"Activity doesn't always translate to value. The seductive appeal of parallel agents producing code quickly can obscure whether that code actually solves the right problem." -- Addy Osmani
Phil Schmid (Hugging Face) and Microsoft's Cloud Adoption Framework both recommend starting with a single agent and only adding complexity when needed. A single agent with well-designed tools is simpler to build, reason about, and debug. A useful heuristic from the community:
- Read tasks (research, analysis, data gathering) → multi-agent works well, since tasks are easily parallelizable
- Write tasks (code generation, document creation) → single agent or sequential pipeline, since shared context is critical
- Mixed tasks → parallel read phase, then sequential write phase
Architecture Decision Matrix
# When to use which architecture
SINGLE_AGENT:
- Task is sequential and state-dependent
- Context fits in one window
- Latency requirements are strict
- Budget is constrained
PIPELINE:
- Linear data transformation (parse → analyze → format)
- Each stage has clear input/output contracts
- No backtracking needed
HUB_AND_SPOKE:
- Clear task decomposition is possible
- Sub-tasks are independent (no cross-talk needed)
- You need full orchestrator visibility
- Debugging simplicity is important
AGENT_TEAMS:
- Cross-layer work requires collaboration
- Multiple perspectives improve quality (code review, debugging)
- Tasks are parallelizable but interdependent
- Budget allows 3-5x token premium
SWARM:
- Exploration-heavy tasks (research, search)
- Resilience is critical (no single point of failure)
- Budget is not a primary concern | Architecture | Complexity | Token Efficiency | Parallelism | Debuggability |
|---|---|---|---|---|
| Single Agent | Low | High | None | Excellent |
| Pipeline | Low-Med | Medium | None | Good |
| Hub-and-Spoke | Medium | Medium | Limited | Good |
| MoE Delegation | Medium | High | Selective | Good |
| Agent Teams (P2P) | High | Low | High | Difficult |
| Hierarchical | High | Low | Moderate | Moderate |
| Blackboard | Medium | Medium | Moderate | Moderate |
| Swarm | Very High | Very Low | Very High | Very Difficult |
Framework Landscape
The architecture you choose is often constrained by the framework you use. Here's how the major frameworks map to these patterns:
| Framework | Primary Pattern | Strengths |
|---|---|---|
| Claude Code | Hub-and-spoke + Agent Teams | Deep IDE integration, both patterns in one tool |
| LangGraph | Graph-based (any pattern) | Most flexible; supports supervisor, hierarchical, custom workflows |
| CrewAI | Role-based hub-and-spoke | Easiest onboarding, enterprise control plane |
| AutoGen | Conversational (group chat) | Strong human-in-the-loop, flexible role-playing |
| Google ADK | Event-driven (pipeline, MoE) | Native Vertex AI integration, A2A protocol support |
| Semantic Kernel | Multiple (all patterns) | Enterprise Azure integration, extensive pattern catalog |
| OpenAI Agents SDK | Sequential handoff | Simple API, lightweight, direct OpenAI integration |
Practical Recommendations
Start Simple, Scale Up
The most consistent advice across the research: start with a single agent. Add sub-agents when you hit context window limits or need parallelism. Graduate to Agent Teams only when you have evidence that cross-agent collaboration would prevent rework.
When Agent Teams Make Sense
The strongest community-validated use cases for Agent Teams:
- Parallel code review: Spawning specialized reviewers (security, performance, test coverage) who each apply a different lens to the same PR
- Adversarial debugging: Multiple agents investigating different failure theories simultaneously, actively trying to disprove each other
- Cross-layer feature development: Frontend, backend, and test agents each owning their layer with direct communication about interface contracts
- Research exploration: Multiple approaches investigated simultaneously with findings shared directly between researchers
When to Avoid Agent Teams
- Sequential tasks: If step 2 depends on step 1's output, a pipeline or single agent is simpler and cheaper
- Budget-constrained work: The ~2x token premium over sub-agents adds up quickly
- Single-file changes: Two teammates editing the same file leads to overwrites -- the architecture requires careful file ownership boundaries
- Tasks requiring determinism: Peer-to-peer coordination introduces non-determinism in message ordering and task claiming
Known Limitations
Agent Teams is still a research preview with real constraints:
- No session resumption with in-process teammates (
/resumedoesn't restore team state) - One team per session; no nested teams
- The lead is fixed for the team's lifetime (no leadership transfer)
- Split pane display requires tmux or iTerm2 (not supported in VS Code terminal or Windows Terminal)
- Task status can lag -- teammates sometimes fail to mark tasks complete, blocking dependents
Looking Ahead
Agent Teams represents one answer to a question the entire industry is grappling with: how should AI agents collaborate? Google's A2A protocol, Microsoft's orchestration patterns, and the open-source frameworks (CrewAI, LangGraph, AutoGen) are all converging on the same problem from different angles.
The likely future isn't one architecture winning but rather composable patterns -- using pipelines for deterministic stages, hub-and-spoke for clear delegation, and peer-to-peer teams for genuinely collaborative work, all within the same system. The Agent Teams primitive is a building block, not a universal solution.
What remains constant across all architectures: the quality of your task specifications, the clarity of your agent roles, and the rigor of your verification matter far more than which coordination pattern you choose. Get those right, and the architecture becomes a force multiplier. Get them wrong, and no amount of agent-to-agent messaging will save you.
Sources
- Anthropic: Orchestrate Teams of Claude Code Sessions (Official Docs)
- Anthropic Engineering: Building a C Compiler with Agent Teams
- TechCrunch: Anthropic Releases Opus 4.6 with Agent Teams
- Addy Osmani: Claude Code Swarms
- Cognition AI: Don't Build Multi-Agents
- Microsoft: AI Agent Design Patterns
- Google: A2A Protocol Announcement
- LangChain: Choosing the Right Multi-Agent Architecture
- arXiv: Why Do Multi-Agent LLM Systems Fail?
- arXiv: AGENTTAXO -- Communication Tax in Multi-Agent Systems
- arXiv: LLM-Powered Swarms -- New Frontier or Conceptual Stretch?
- arXiv: LLM Multi-Agent Systems Based on Blackboard Architecture
- arXiv: Mixture-of-Agents Enhances LLM Capabilities
- Phil Schmid: Single vs Multi-Agent Systems
- paddo.dev: Claude Code's Hidden Multi-Agent System
- Augment Code: Why Multi-Agent LLM Systems Fail