Back to Home

Agent Teams vs Sub-Agents: Navigating Multi-Agent Architectures for LLMs

Claude Code's new Agent Teams feature lets AI agents communicate peer-to-peer and self-coordinate. How does this compare to traditional hub-and-spoke sub-agents, and when should you use each architecture?

On February 5, 2026, Anthropic launched Agent Teams as a research preview in Claude Code alongside Opus 4.6. The feature represents a fundamental shift in how AI agents collaborate: instead of a single orchestrator delegating to isolated sub-agents that can only report back, agents can now communicate peer-to-peer, self-claim tasks from a shared pool, and coordinate directly with each other.

This post examines what Agent Teams actually are, how they compare to the traditional sub-agent model, and where they fit in the broader landscape of multi-agent architectures. Rather than declaring a winner, the goal is to build a clear mental model for when each approach makes sense.

TL;DR: Agent Teams trade simplicity and token efficiency for richer collaboration and parallelism. They shine when tasks require cross-agent coordination (not just delegation). For most workflows, the simpler hub-and-spoke model or even a single agent remains the better choice. The right architecture depends on whether your agents need to talk to each other or just report to a boss.

The Old Model: Sub-Agents as Isolated Workers

Before Agent Teams, Claude Code used a hub-and-spoke pattern. The main session (the "hub") could spawn sub-agents using the Task tool. Each sub-agent operated in its own context window, executed a focused task, and returned results exclusively to the calling agent. Sub-agents could not talk to each other.

// The old model: Task tool spawns isolated sub-agents
// Each sub-agent returns results ONLY to the parent

Main Agent
  ├── Task("Search for auth files")     → returns file list
  ├── Task("Analyze test coverage")     → returns coverage report
  └── Task("Check for security issues") → returns vulnerability list

// Sub-agents cannot talk to each other
// All coordination flows through the main agent
// Each sub-agent operates in its own context window

This model works well for isolated, parallelizable tasks where only the final result matters. Need to search three different directories simultaneously? Spawn three sub-agents. Need a code review and a test analysis? Two sub-agents, results merged by the parent.

But the model breaks down when sub-agents need awareness of each other's work. Consider building a full-stack feature: a frontend agent creates a component expecting a certain API shape, while a backend agent independently designs a different response format. Neither agent can see the other's decisions. The orchestrator only discovers the mismatch after both finish, leading to rework.

This is what Cognition AI (the team behind Devin) calls the "Flappy Bird problem": two agents tasked with building parts of a game independently produce a Super Mario background and a mismatched bird sprite. Without real-time visibility into each other's work, coherence is impossible.

Agent Teams: Peer-to-Peer Collaboration

Agent Teams introduces four core primitives that transform the architecture from hub-and-spoke to peer-to-peer:

Component Role
Team Lead Creates the team, spawns teammates, assigns initial tasks, synthesizes results
Teammates Fully independent Claude Code sessions that work on assigned tasks
Shared Task List Persistent, disk-based task pool with dependency tracking and self-claiming
Mailbox System JSON-based inboxes enabling direct agent-to-agent messaging

The feature is currently gated behind an environment variable:

// Enable Agent Teams (research preview)
// settings.json
{
  "env": {
    "CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1"
  }
}

On-Disk Coordination

Unlike in-memory orchestration, Agent Teams coordinate entirely through the filesystem. Tasks are numbered JSON files. Messages are appended to inbox JSON files. File locking prevents race conditions when multiple teammates try to claim the same task.

# On-disk structure for Agent Teams
~/.claude/
├── teams/
│   └── my-feature-team/
│       ├── config.json          # Team members, roles, metadata
│       └── inboxes/
│           ├── lead.json        # Lead agent's message inbox
│           ├── frontend.json    # Frontend agent's inbox
│           ├── backend.json     # Backend agent's inbox
│           └── tests.json       # Test agent's inbox
└── tasks/
    └── my-feature-team/
        ├── 001.json             # { status: "completed", owner: "frontend" }
        ├── 002.json             # { status: "in_progress", owner: "backend" }
        ├── 003.json             # { status: "pending", blockedBy: ["002"] }
        └── 004.json             # { status: "pending", owner: null }

This design has a critical implication: there is no shared memory. Agents cannot see each other's context windows. The only coordination channels are task files and messages. This is both a feature (isolation prevents cascading failures) and a limitation (agents must explicitly communicate everything they want others to know).

The Messaging System

The SendMessage tool supports structured message types that enable rich coordination patterns:

// Agent Teams message types via SendMessage tool
{
  // Direct message to a specific teammate
  "type": "message",
  "to": "backend-agent",
  "content": "The API schema changed -- endpoint now returns { data, meta }"
}

{
  // Broadcast to all teammates (expensive -- scales with team size)
  "type": "broadcast",
  "content": "Shared types updated in src/types/api.ts -- pull before editing"
}

{
  // Task completion notification (auto-sent)
  "type": "task_completed",
  "taskId": "002",
  "summary": "REST endpoints implemented, OpenAPI spec at docs/api.yaml"
}

{
  // Plan approval gate -- teammate pauses until lead approves
  "type": "plan_approval_request",
  "plan": "Proposing to refactor auth middleware to support JWT + OAuth2"
}

{
  // Idle notification (auto-sent when teammate finishes all tasks)
  "type": "idle_notification",
  "completedTasks": ["001", "003"]
}

Two mechanisms stand out. First, plan approval gates: a teammate can be forced to work in read-only plan mode until the lead explicitly approves its approach. This prevents agents from charging ahead with a bad plan. Second, idle notifications: when a teammate finishes all its tasks, it automatically alerts the lead, who can then assign more work or shut down the team.

Quality Control Hooks

Agent Teams integrates with Claude Code's hook system for enforcement:

  • TeammateIdle hook: Runs when a teammate is about to go idle. Exit code 2 sends feedback and keeps the teammate working (useful for preventing premature completion).
  • TaskCompleted hook: Runs when a task is being marked complete. Exit code 2 rejects the completion and sends feedback (useful for enforcing quality gates).

There's also a delegate mode (activated with Shift+Tab) that restricts the lead to coordination-only tools -- preventing a common failure mode where the lead starts implementing tasks itself instead of waiting for its teammates.

The C Compiler Stress Test

Anthropic stress-tested Agent Teams by having 16 parallel agents build a Rust-based C compiler from scratch. The results, published in an engineering blog post:

  • Nearly 2,000 Claude Code sessions over two weeks
  • 2 billion input tokens, 140 million output tokens
  • Cost: just under $20,000
  • Result: a 100,000-line compiler that compiles Linux 6.9 on x86, ARM, and RISC-V, passes 99% of GCC torture tests, and can build PostgreSQL, Redis, FFmpeg, and Doom

Key lessons from the project: high-quality test suites are essential (agents will solve the wrong problem if the verifier isn't precise), file ownership must be carefully separated (two teammates editing the same file leads to overwrites), and extensive READMEs are critical for agent orientation since agents lack persistent memory across sessions.

Sub-Agents vs Agent Teams: Side-by-Side

Aspect Sub-Agents (Task Tool) Agent Teams
Communication Report back to parent only Any agent can message any other
Lifetime Short-lived, synchronous Persistent, long-running sessions
Task coordination Main agent manages all work Shared task list with self-claiming
Context Own window; result string returned Own window; full independent session
Failure isolation Sub-agent failure returns error to parent Teammate failure may block dependent tasks
Token cost Lower (~6-8x baseline for 3 agents) Higher (~12-15x baseline for 3 agents)
Debuggability Good (all flows through one point) Harder (distributed message traces)
Best for Focused tasks where only the result matters Complex work requiring cross-agent discussion
The fundamental trade-off: Sub-agents are cheaper and simpler but produce isolated results. Agent Teams are expensive and complex but enable collaboration. The question is whether your task requires agents to be aware of each other's work -- or whether independent results, merged by an orchestrator, are sufficient.

The Broader Architecture Landscape

Agent Teams and sub-agents are just two points on a spectrum of multi-agent architectures. Understanding the full landscape helps you pick the right tool for each problem.

1. Hub-and-Spoke (Supervisor)

# Hub-and-Spoke (Supervisor) Pattern
# All communication flows through a central orchestrator

class Orchestrator:
    def __init__(self, agents: list[Agent]):
        self.agents = {a.name: a for a in agents}

    async def execute(self, task: str):
        # 1. Decompose task
        subtasks = await self.decompose(task)

        # 2. Delegate to specialists
        results = {}
        for subtask in subtasks:
            agent = self.route(subtask)
            results[subtask.id] = await agent.execute(subtask)
            # Each agent returns results ONLY to orchestrator

        # 3. Synthesize
        return await self.synthesize(results)

# Pros: Simple, debuggable, full visibility
# Cons: Bottleneck, context loss between agents, sequential

The most common pattern. A central orchestrator decomposes tasks, delegates to specialists, and synthesizes results. All communication flows through the hub. This is what Claude Code's Task tool, LangGraph's supervisor pattern, and Microsoft's Semantic Kernel use by default.

When to use: Task decomposition is clear, sub-tasks are independent, and you need full visibility into the workflow. This is the right default for most agentic applications.

2. Peer-to-Peer (Agent Teams / A2A)

# Peer-to-Peer (Agent Teams) Pattern
# Agents communicate directly with each other

class Teammate:
    def __init__(self, name: str, inbox: MessageQueue):
        self.name = name
        self.inbox = inbox
        self.task_pool = SharedTaskList()

    async def run(self):
        while True:
            # Self-claim available tasks
            task = await self.task_pool.claim_next()
            if not task:
                await self.notify_idle()
                break

            # Execute with awareness of teammates
            result = await self.execute(task)

            # Notify teammates who depend on this work
            for blocked_task in task.blocks:
                owner = blocked_task.owner
                if owner:
                    await self.send_message(owner, {
                        "type": "dependency_resolved",
                        "task": task.id,
                        "summary": result.summary
                    })

            await self.task_pool.mark_completed(task.id)

# Pros: Rich collaboration, no bottleneck, parallel
# Cons: Coordination complexity, token explosion, non-deterministic

Agents communicate directly, claim tasks from a shared pool, and coordinate without routing everything through a central point. Claude Code's Agent Teams is one implementation; Google's Agent2Agent Protocol (A2A) provides a standardized version for cross-vendor interoperability, backed by 50+ technology partners.

When to use: Cross-layer collaboration is essential, multiple perspectives improve quality, and you can afford the token premium.

3. Sequential Pipeline

# Sequential Pipeline Pattern
# Fixed-order assembly line -- each stage feeds the next

pipeline = [
    Agent("parser",     tools=["read_file", "parse_ast"]),
    Agent("analyzer",   tools=["lint", "type_check", "complexity"]),
    Agent("reviewer",   tools=["check_style", "find_bugs"]),
    Agent("reporter",   tools=["format_markdown", "write_file"]),
]

async def run_pipeline(input_data):
    state = input_data
    for agent in pipeline:
        state = await agent.process(state)
        # Each agent transforms state and passes it forward
        # No backtracking -- if stage 3 finds issues,
        # it cannot ask stage 1 to redo its work
    return state

# Pros: Deterministic, easy to debug, clear data lineage
# Cons: No parallelism, no backtracking, one slow stage blocks all

A fixed-order assembly line where each agent transforms data and passes it to the next. Google ADK and Microsoft's architecture catalog both formalize this pattern. It's the simplest multi-agent architecture and the easiest to debug.

When to use: Linear data transformation (parse → analyze → format), progressive refinement (draft → review → polish), or any workflow where stages have clear input/output contracts.

4. Blackboard Architecture

# Blackboard Architecture
# Agents collaborate via a shared knowledge base

class Blackboard:
    def __init__(self):
        self.state = {}       # Shared knowledge base
        self.subscribers = [] # Agents watching for changes

    async def write(self, key: str, value: any, author: str):
        self.state[key] = {
            "value": value,
            "author": author,
            "timestamp": now()
        }
        # Notify agents interested in this key
        for agent in self.subscribers:
            if agent.watches(key):
                await agent.on_update(key, value)

    async def read(self, key: str) -> any:
        return self.state.get(key, {}).get("value")

# Usage: Agents read/write to blackboard independently
# Agent A posts partial solution → Agent B refines it
# No direct agent-to-agent communication needed

# Pros: Decoupled agents, incremental problem-solving
# Cons: State management complexity, potential conflicts

Originated in the 1980s for speech recognition and now being revived for LLM-based systems. Agents collaborate through a shared knowledge base without direct communication. Each agent reads from and writes to the blackboard independently, and a control unit determines which agent acts next. A 2025 arXiv paper explores this for LLM multi-agent systems with 9 specialized agents.

When to use: Problems where solutions emerge from accumulated partial contributions, asynchronous collaboration, and agents that don't need to know about each other.

5. Hierarchical Teams

A tree-like structure with multiple levels of supervision. Top-level supervisors manage team-level supervisors, who manage worker agents. LangGraph supports hierarchical agent teams natively.

When to use: Large-scale complex systems that mirror organizational structures. But beware: research from the Puppeteer framework found that static hierarchies waste resources on unnecessary branches, leading them to develop RL-learned orchestrators that dynamically skip low-value subtrees.

6. Mixture-of-Experts Delegation

Borrowed from the neural network architecture: a router mechanism directs inputs to the most appropriate specialist agent. Only the relevant "expert" is activated, making this the most token-efficient multi-agent pattern. Google ADK implements this as a coordinator agent managing specialist sub-agents with routing logic.

A more sophisticated variant, Mixture-of-Agents (MoA), uses layered LLM agents where each agent in a layer takes all outputs from the previous layer as auxiliary input, creating iteratively refined responses.

When to use: Routing to specialists based on input classification. Ideal when you have clearly delineated domains (legal vs. financial vs. technical queries) and want minimal token waste.

7. Swarm Intelligence

Inspired by biological swarms. Agents follow simple local rules, and complex global behavior emerges. OpenAI explored this with the Swarm framework (now succeeded by the OpenAI Agents SDK). The Swarms framework targets enterprise applications with 36,000+ GitHub stars.

However, a 2025 research paper found that LLM-powered swarms require roughly 300x more computation time than classical counterparts, questioning whether "LLM-powered swarms" are a genuine frontier or a conceptual stretch.

When to use: Exploration-heavy tasks where resilience matters more than efficiency. In practice, the token economics make true swarming prohibitively expensive for most applications.

The Numbers: Token Cost and Failure Rates

# Token consumption comparison (approximate)

Standard chat interaction:     ~1x     (baseline)
Single agent with tools:       ~4x
Hub-and-spoke (3 sub-agents):  ~6-8x   (sub-agent contexts + orchestrator)
Agent Teams (3 teammates):     ~12-15x (full sessions + messaging overhead)
Swarm (10+ agents):            ~50x+   (each agent = full context window)

# Research finding: skill-based single-agent systems achieve
# similar accuracy to multi-agent counterparts while reducing
# token consumption by 54% and latency by 50% on average
# (arXiv: 2601.04748)

The token economics are stark. Community-reported costs for Claude Code specifically:

Approach Typical Token Usage
Solo session ~200k tokens
3 sub-agents ~440k tokens
3-person team ~800k tokens

Agent Teams roughly double the cost of equivalent sub-agent setups because each teammate is a full, persistent Claude Code session (not a short-lived helper), and the messaging system adds per-message token overhead.

Failure Rates in Production

Research shows multi-agent LLM systems fail at 41-86.7% rates in production. A systematic study found that nearly 79% of failures come from two categories:

Failure Category % of Failures Description
Specification Problems 41.8% Role ambiguity, unclear task definitions, missing constraints
Coordination Failures 36.9% Communication breakdowns, state sync issues, conflicting objectives
Verification Gaps 21.3% Inadequate testing, missing validation mechanisms

The implication: most multi-agent failures are design problems, not infrastructure problems. Better task specifications and communication protocols matter more than better models.

The communication tax: The AGENTTAXO framework introduced the concept of a "communication tax" -- the overhead from inter-agent interactions. Every extra message compounds latency and cost almost quadratically with the number of communication rounds. This is the primary reason why peer-to-peer architectures (Agent Teams) cost significantly more than hub-and-spoke patterns.

The Single-Agent vs Multi-Agent Debate

This became the central debate in the AI agent community in mid-2025 when Cognition AI published "Don't Build Multi-Agents" and Anthropic released details of their multi-agent research system on the very next day.

Cognition's Argument

Walden Yan argued that multi-agent architectures are inherently fragile due to insufficient context sharing and conflicting implicit decisions. Parallel sub-agents make independent assumptions that create downstream inconsistencies. Their recommendation: single-threaded linear agents as the default, with a specialized compression model to summarize action histories for longer tasks.

Anthropic's Counter

Anthropic's multi-agent research system -- multiple Claude agents working in concert -- outperformed single-agent systems by over 90% on certain research tasks. Their position: for inherently parallel work (research, code review, exploration), multi-agent coordination provides measurable quality improvements.

The Synthesis

Despite the opposing headlines, both camps agree on the fundamental point: context management is the primary determinant of agent reliability. The real question isn't "single vs. multi" but rather "how do you maintain coherent context?" -- whether through a single agent's persistent memory or through carefully engineered inter-agent communication protocols.

"Activity doesn't always translate to value. The seductive appeal of parallel agents producing code quickly can obscure whether that code actually solves the right problem." -- Addy Osmani

Phil Schmid (Hugging Face) and Microsoft's Cloud Adoption Framework both recommend starting with a single agent and only adding complexity when needed. A single agent with well-designed tools is simpler to build, reason about, and debug. A useful heuristic from the community:

  • Read tasks (research, analysis, data gathering) → multi-agent works well, since tasks are easily parallelizable
  • Write tasks (code generation, document creation) → single agent or sequential pipeline, since shared context is critical
  • Mixed tasks → parallel read phase, then sequential write phase

Architecture Decision Matrix

# When to use which architecture

SINGLE_AGENT:
  - Task is sequential and state-dependent
  - Context fits in one window
  - Latency requirements are strict
  - Budget is constrained

PIPELINE:
  - Linear data transformation (parse → analyze → format)
  - Each stage has clear input/output contracts
  - No backtracking needed

HUB_AND_SPOKE:
  - Clear task decomposition is possible
  - Sub-tasks are independent (no cross-talk needed)
  - You need full orchestrator visibility
  - Debugging simplicity is important

AGENT_TEAMS:
  - Cross-layer work requires collaboration
  - Multiple perspectives improve quality (code review, debugging)
  - Tasks are parallelizable but interdependent
  - Budget allows 3-5x token premium

SWARM:
  - Exploration-heavy tasks (research, search)
  - Resilience is critical (no single point of failure)
  - Budget is not a primary concern
Architecture Complexity Token Efficiency Parallelism Debuggability
Single Agent Low High None Excellent
Pipeline Low-Med Medium None Good
Hub-and-Spoke Medium Medium Limited Good
MoE Delegation Medium High Selective Good
Agent Teams (P2P) High Low High Difficult
Hierarchical High Low Moderate Moderate
Blackboard Medium Medium Moderate Moderate
Swarm Very High Very Low Very High Very Difficult

Framework Landscape

The architecture you choose is often constrained by the framework you use. Here's how the major frameworks map to these patterns:

Framework Primary Pattern Strengths
Claude Code Hub-and-spoke + Agent Teams Deep IDE integration, both patterns in one tool
LangGraph Graph-based (any pattern) Most flexible; supports supervisor, hierarchical, custom workflows
CrewAI Role-based hub-and-spoke Easiest onboarding, enterprise control plane
AutoGen Conversational (group chat) Strong human-in-the-loop, flexible role-playing
Google ADK Event-driven (pipeline, MoE) Native Vertex AI integration, A2A protocol support
Semantic Kernel Multiple (all patterns) Enterprise Azure integration, extensive pattern catalog
OpenAI Agents SDK Sequential handoff Simple API, lightweight, direct OpenAI integration

Practical Recommendations

Start Simple, Scale Up

The most consistent advice across the research: start with a single agent. Add sub-agents when you hit context window limits or need parallelism. Graduate to Agent Teams only when you have evidence that cross-agent collaboration would prevent rework.

When Agent Teams Make Sense

The strongest community-validated use cases for Agent Teams:

  1. Parallel code review: Spawning specialized reviewers (security, performance, test coverage) who each apply a different lens to the same PR
  2. Adversarial debugging: Multiple agents investigating different failure theories simultaneously, actively trying to disprove each other
  3. Cross-layer feature development: Frontend, backend, and test agents each owning their layer with direct communication about interface contracts
  4. Research exploration: Multiple approaches investigated simultaneously with findings shared directly between researchers

When to Avoid Agent Teams

  • Sequential tasks: If step 2 depends on step 1's output, a pipeline or single agent is simpler and cheaper
  • Budget-constrained work: The ~2x token premium over sub-agents adds up quickly
  • Single-file changes: Two teammates editing the same file leads to overwrites -- the architecture requires careful file ownership boundaries
  • Tasks requiring determinism: Peer-to-peer coordination introduces non-determinism in message ordering and task claiming

Known Limitations

Agent Teams is still a research preview with real constraints:

  • No session resumption with in-process teammates (/resume doesn't restore team state)
  • One team per session; no nested teams
  • The lead is fixed for the team's lifetime (no leadership transfer)
  • Split pane display requires tmux or iTerm2 (not supported in VS Code terminal or Windows Terminal)
  • Task status can lag -- teammates sometimes fail to mark tasks complete, blocking dependents

Looking Ahead

Agent Teams represents one answer to a question the entire industry is grappling with: how should AI agents collaborate? Google's A2A protocol, Microsoft's orchestration patterns, and the open-source frameworks (CrewAI, LangGraph, AutoGen) are all converging on the same problem from different angles.

The likely future isn't one architecture winning but rather composable patterns -- using pipelines for deterministic stages, hub-and-spoke for clear delegation, and peer-to-peer teams for genuinely collaborative work, all within the same system. The Agent Teams primitive is a building block, not a universal solution.

What remains constant across all architectures: the quality of your task specifications, the clarity of your agent roles, and the rigor of your verification matter far more than which coordination pattern you choose. Get those right, and the architecture becomes a force multiplier. Get them wrong, and no amount of agent-to-agent messaging will save you.

Sources