Skip to content

08 — Agent Swarms: Multi-Agent Team Coordination

🌐 Language: English | 中文版 → 📖 Read Online → — Sidebar nav, dark mode & full-text search. Better than raw GitHub.

Scope: tools/TeamCreateTool/, tools/SendMessageTool/, tools/shared/spawnMultiAgent.ts, utils/swarm/ (~30 files, ~6.8K lines), utils/teammateMailbox.ts (1,184 lines)

One-liner: How Claude Code spawns parallel teammates across tmux panes, iTerm2 splits, or in-process — all coordinated through a file-based mailbox system with lockfile concurrency control.


Architecture Overview

08 agent swarms 1


1. Team Lifecycle

Creating a Team

TeamCreateTool initializes the team infrastructure:

  1. Generate unique team name (word-slug if collision)
  2. Create team leader entry with deterministic agent ID: team-lead@{teamName}
  3. Write config.json to ~/.claude/teams/{team-name}/
  4. Register for session cleanup (auto-delete on exit if not explicitly deleted)
  5. Reset task list directory for fresh task numbering
typescript
// Team file structure
type TeamFile = {
  name: string
  description?: string
  createdAt: number
  leadAgentId: string          // "team-lead@my-project"
  leadSessionId?: string       // For team discovery
  teamAllowedPaths?: TeamAllowedPath[]  // Shared edit permissions
  members: Array<{
    agentId: string            // "researcher@my-project"
    name: string
    agentType?: string
    model?: string
    prompt?: string
    color?: string             // Unique color for UI identification
    planModeRequired?: boolean
    tmuxPaneId: string
    cwd: string
    worktreePath?: string      // Git worktree for isolation
    subscriptions: string[]
    backendType?: BackendType  // "tmux" | "iterm2" | "in-process"
    isActive?: boolean
    mode?: PermissionMode
  }>
}

Spawning Teammates

spawnMultiAgent.ts (1,094 lines) handles the full spawn flow:

  1. Resolve model: inherit → leader's model; undefined → hardcoded fallback
  2. Generate unique name: Check existing members, append -2, -3, etc.
  3. Detect backend: tmux > iTerm2 > in-process (see §3)
  4. Create pane/process: Backend-specific spawn
  5. Build CLI args: propagate --agent-id, --team-name, --agent-color, --permission-mode
  6. Register in team file: Add member entry to config.json
  7. Send initial message: Write prompt to teammate's mailbox
  8. Register background task: For UI task pill display

2. The Mailbox System

The inter-agent communication backbone is a file-based mailbox with lockfile concurrency control.

Directory Structure

~/.claude/teams/{team-name}/
├── config.json              # Team manifest
└── inboxes/
    ├── team-lead.json       # Leader's inbox
    ├── researcher.json      # Teammate inbox
    └── test-runner.json     # Teammate inbox

Message Types

TypeDirectionPurpose
Plain text DMAny → AnyDirect messages
Broadcast (to: "*")Leader → AllTeam-wide announcements
idle_notificationWorker → Leader"I'm done / blocked / failed"
permission_requestWorker → LeaderTool permission delegation
permission_responseLeader → WorkerPermission grant/deny
sandbox_permission_requestWorker → LeaderNetwork access approval
plan_approval_requestWorker → LeaderPlan review for planModeRequired
shutdown_requestLeader → WorkerGraceful shutdown
shutdown_approved/rejectedWorker → LeaderShutdown confirmation

Concurrency Control

typescript
const LOCK_OPTIONS = {
  retries: { retries: 10, minTimeout: 5, maxTimeout: 100 }
}
// Write with lock
release = await lockfile.lock(inboxPath, {
  lockfilePath: `${inboxPath}.lock`, ...LOCK_OPTIONS
})
const messages = await readMailbox(recipientName, teamName)
messages.push(newMessage)
await writeFile(inboxPath, jsonStringify(messages, null, 2))
release()

Multiple Claude instances in a swarm can write concurrently — the lockfile serializes access with exponential backoff retries.


3. Backend Detection and Execution

Three backends determine how teammates physically run:

Backend Comparison

FeatureTmuxiTerm2In-Process
IsolationSeparate processSeparate processSame process, separate query loop
UI visibilityPane with colored borderNative iTerm2 paneBackground task pill
Prerequisitetmux installedit2 CLI installedNone
Non-interactive mode (-p)✅ (forced)
Socket isolationPID-scoped: claude-swarm-{pid}N/AN/A

In-Process Fallback

When no pane backend is available (no tmux, no iTerm2), markInProcessFallback() is called and subsequent spawns skip detection entirely, running teammates as in-process query loops sharing the same Node.js process.


4. Permission Delegation

Teammates don't have interactive terminals — they delegate permission decisions to the leader:

Permission Flow

  1. Worker needs permission → creates permission_request message
  2. Writes to leader's mailbox via teammateMailbox.ts
  3. Leader's useInboxPoller picks up the request
  4. Leader shows permission prompt to user
  5. Leader sends permission_response back to worker's mailbox
  6. Worker polls inbox, gets response, continues or aborts

Plan Mode Required

Some teammates can be spawned with plan_mode_required: true:

  • Teammate must enter plan mode and create a plan
  • Plan is sent to leader as plan_approval_request
  • Leader reviews and sends plan_approval_response with approved: true/false
  • On approval, leader's permission mode is inherited (unless it's plan, which maps to default)

5. Agent Identity System

typescript
// Deterministic agent ID format
function formatAgentId(name: string, teamName: string): string {
  return `${name}@${teamName}`  // "researcher@my-project"
}

// Leader is always
const TEAM_LEAD_NAME = 'team-lead'  // → "team-lead@my-project"

CLI Flag Propagation

When spawning teammates, the leader propagates:

FlagConditionPurpose
--dangerously-skip-permissionsbypass mode + NOT planModeRequiredInherit permission bypass
--permission-mode autoauto modeInherit classifier
--permission-mode acceptEditsacceptEdits modeInherit edit auto-allow
--model {model}Explicit model overrideUse leader's model
--settings {path}CLI settings pathShare settings
--plugin-dir {dir}Inline pluginsShare plugins
--parent-session-id {id}AlwaysLineage tracing

6. Team Cleanup

Graceful Shutdown

SendMessage(type: shutdown_request) → teammate responds shutdown_approved/rejected:

  • Approved: In-process teammates abort their query loop; pane teammates call gracefulShutdown(0) via setImmediate
  • Rejected: Teammate provides reason, continues working

Session Cleanup

cleanupSessionTeams() runs on leader exit (registered with gracefulShutdown):

  1. Kill orphaned teammate panes (still running after SIGINT)
  2. Remove team directory: ~/.claude/teams/{team-name}/
  3. Remove task directory: ~/.claude/tasks/{team-name}/
  4. Destroy git worktrees for isolated teammates

Transferable Design Patterns

The following patterns from the Swarm system can be directly applied to any multi-agent or distributed coordination architecture.

Why File-Based Mailboxes?

The mailbox system uses plain JSON files with lockfiles instead of IPC, WebSockets, or shared memory. This works because:

  • Cross-process: tmux panes are separate processes with no shared memory
  • Crash-safe: Messages persist on disk even if a teammate crashes
  • Debuggable: cat ~/.claude/teams/my-team/inboxes/researcher.json
  • Simple: No daemon, no port allocation, no service discovery

The Backend Abstraction

The TeammateExecutor interface and PaneBackend abstraction mean the swarm system doesn't care how teammates run — only that they can receive initial prompts and exchange messages. Adding a new backend (e.g., Docker containers, SSH sessions) would only require implementing the PaneBackend interface.

One Leader, Many Workers

The architecture enforces a strict leader-worker hierarchy:

  • Only one team per leader session
  • Workers cannot create teams or approve their own plans
  • Shutdown is always leader-initiated, worker-approved
  • Permission delegation always flows worker → leader → worker

8. Coordinator Mode

Source coordinates: src/coordinator/coordinatorMode.ts

Coordinator mode transforms the leader from a task dispatcher into a synthesis engine — it doesn't just delegate work, it understands and integrates results.

8.1 Activation: Double Gate

typescript
// 源码位置: src/coordinator/coordinatorMode.ts
export function isCoordinatorMode(): boolean {
  if (feature('COORDINATOR_MODE')) {
    return isEnvTruthy(process.env.CLAUDE_CODE_COORDINATOR_MODE)
  }
  return false
}

Both the build-time feature flag AND the runtime environment variable must be enabled. When resuming a session, matchSessionMode() automatically flips the variable to match the resumed session's mode.

8.2 Coordinator Workflow

Research (Workers, parallel)
   ↓ Results returned via <task-notification>
Synthesis (Coordinator)
   ↓ Coordinator integrates findings
Implementation (Workers, file-serial)
   ↓ Write operations serialized per file set
Verification (Workers, parallel)

Core principles:

  • Coordinator owns synthesis — no "based on your findings" delegation; the coordinator must understand and restate
  • Parallel is the superpower — independent workers run concurrently
  • Read-write isolation — research tasks parallel, write ops serialized by file sets
  • Continue vs Spawn — context overlap determines whether to reuse a worker or create a new one

8.3 Worker Context Injection

typescript
export function getCoordinatorUserContext(
  mcpClients: ReadonlyArray<{ name: string }>,
  scratchpadDir?: string,
): { [k: string]: string } {
  // Tells coordinator what tools workers have access to
  // Injects MCP server info for tool routing decisions
  // Provides scratchpad directory for no-prompt file sharing
}

9. Task Type Union (7 Variants)

Source coordinates: src/tasks/

Every background task in Claude Code is represented by one of seven state variants:

typescript
export type TaskState =
  | LocalShellTaskState         // Local shell command execution
  | LocalAgentTaskState         // Sub-agent via AgentTool
  | RemoteAgentTaskState        // Remote CCR agent
  | InProcessTeammateTaskState  // Same-process team member
  | LocalWorkflowTaskState      // Local workflow execution
  | MonitorMcpTaskState         // MCP server monitoring
  | DreamTaskState              // Auto-memory consolidation

Progress Tracking

typescript
export type AgentProgress = {
  toolUseCount: number
  tokenCount: number
  lastActivity?: ToolActivity
  recentActivities?: ToolActivity[]  // Last 5 tool uses
  summary?: string
}

// input_tokens are CUMULATIVE (take latest), output_tokens are PER-TURN (sum)
export function updateProgressFromMessage(tracker, message): void {
  tracker.latestInputTokens = usage.input_tokens + cache_creation + cache_read
  tracker.cumulativeOutputTokens += usage.output_tokens
}

Agent Completion Notification

xml
<task-notification>
  <task-id>{agentId}</task-id>
  <status>completed|failed|killed</status>
  <summary>{description}</summary>
  <result>{agent final text response}</result>
  <usage>
    <total_tokens>N</total_tokens>
    <tool_uses>N</tool_uses>
    <duration_ms>N</duration_ms>
  </usage>
</task-notification>

The notification is injected as a user type message, so the LLM processes it naturally in the conversation flow.


10. Inter-Agent Communication Protocol

Source coordinates: src/tools/SendMessageTool/

10.1 Structured Message Types

Beyond plain text, agents can exchange typed control messages:

typescript
const StructuredMessage = z.discriminatedUnion('type', [
  z.object({ type: z.literal('shutdown_request'), reason: z.string().optional() }),
  z.object({ type: z.literal('shutdown_response'), request_id: z.string(), approve: semanticBoolean() }),
  z.object({ type: z.literal('plan_approval_response'), request_id: z.string(), approve: semanticBoolean(), feedback: z.string().optional() }),
])

// Input schema: `to` field supports multiple routing targets
const inputSchema = z.object({
  to: z.string(),      // "name" | "*" (broadcast) | "uds:path" | "bridge:session_id"
  summary: z.string().optional(),
  message: z.union([z.string(), StructuredMessage]),
})

10.2 Message Routing

SendMessage(to="researcher", message="...")

InProcessTeammate? → findTeammateTaskByAgentId → direct pendingMessages
  ↓ No
LocalAgent? → queuePendingMessage → consumed at tool-round boundary
  ↓ No
Pane (tmux/iterm2)? → writeToMailbox → filesystem mailbox
  ↓ No
UDS/Bridge? → socket/bridge transport
  ↓ No
"*" (broadcast)? → iterate all team members, send individually

10.3 Cross-Session Communication (UDS)

When feature('UDS_INBOX') is enabled, Claude Code sessions on the same machine can communicate via Unix Domain Sockets:

typescript
// "uds:/tmp/cc-socks/1234.sock" — local Claude session
// "bridge:session_01AbCd..." — Remote Control session
// Messages wrapped as <cross-session-message from="..."> XML

11. DreamTask & UltraPlan

DreamTask: Automatic Memory Consolidation

Source coordinates: src/tasks/DreamTaskState

typescript
export type DreamTaskState = TaskStateBase & {
  type: 'dream'
  phase: 'starting' | 'updating'
  sessionsReviewing: number
  filesTouched: string[]    // Incomplete — only Edit/Write tool_use, misses bash writes
  turns: DreamTurn[]        // Last 30 turns
  abortController?: AbortController
  priorMtime: number        // Rollback lock on kill
}

DreamTask runs a background agent that reviews recent session history and consolidates learnings into MEMORY.md. The priorMtime field acts as a rollback lock — if the dream is killed mid-write, the system can restore the file to its pre-dream state.

UltraPlan: Orchestrated Remote Execution

typescript
export type RemoteAgentTaskState = TaskStateBase & {
  type: 'remote_agent'
  remoteTaskType: RemoteTaskType  // 'ultraplan' | 'ultrareview' | 'autofix-pr' | ...
  isUltraplan?: boolean
  ultraplanPhase?: 'needs_input' | 'plan_ready'
  // ...
}

UltraPlan extends the agent paradigm to remote execution via CCR (Claude Code Runner), enabling plan-then-execute workflows where the plan is generated remotely and must receive user approval before implementation begins.

→ For the full inventory of 11 unreleased agent tools (SleepTool, PushNotificationTool, SubscribePRTool, DaemonTool, CoordinatorTool, MorerightTool, DreamConsolidationTool, DxtTool, UltraplanTool, VoiceInputTool, BuddyTool) and their feature gates, see Episode 17: Telemetry & Ops §9.5.


Summary

ComponentLinesRole
spawnMultiAgent.ts1,094Unified teammate spawn logic
teammateMailbox.ts1,184File-based mailbox with lockfile concurrency
teamHelpers.ts684Team file CRUD, cleanup, worktree management
SendMessageTool.ts918DM, broadcast, shutdown, plan approval
TeamCreateTool.ts241Team initialization
backends/registry.ts465Backend detection: tmux > iTerm2 > in-process
constants.ts34Team names, session names, env vars
teammateLayoutManager.ts~400Pane creation, color assignment, border status

The swarm system is Claude Code's most operationally complex feature — it combines process management, file-based IPC, terminal multiplexing, and distributed permission delegation into a cohesive multi-agent framework. The file-based mailbox design prioritizes simplicity and debuggability over performance, which is the right trade-off when your "distributed system" is multiple AI agents sharing a single filesystem.


Previous: ← 07 — Permission PipelineNext: → 09 — Session Persistence

Released under the MIT License.