Skip to content

Episode 13: Bridge System — The Remote Control Protocol

🌐 Language: English | 中文版 → 📖 Read Online → — Sidebar nav, dark mode & full-text search. Better than raw GitHub.

Source files: bridge/ directory — 31 files, ~450KB total. Core: bridgeMain.ts (3,000 lines), replBridge.ts (2,407 lines), remoteBridgeCore.ts (1,009 lines), replBridgeTransport.ts (371 lines), sessionRunner.ts (551 lines), types.ts (263 lines)

One-liner: The Bridge is Claude Code's remote control wire — a poll-dispatch-heartbeat loop that lets users type on claude.ai while their local machine executes code, with two transport generations (v1 WebSocket/POST, v2 SSE/CCRClient), crash recovery pointers, JWT refresh scheduling, and a 32-session capacity manager.

Architecture Overview


The Two Bridge Modes

1. Standalone Bridge (claude remote-control)

bridgeMain.ts (3,000 lines) — the long-running server mode:

User runs: claude remote-control
→ registerBridgeEnvironment() → get environment_id + secret
→ Poll loop: pollForWork() every N ms
→ On work: spawn child `claude --print --sdk-url ...`
→ Child streams NDJSON → bridge forwards to server
→ On done: stopWork() → archive session → return to poll

Spawn Modes:

ModeFlagBehavior
single-session(default)One session, bridge exits when it ends
worktree--worktreeEach session gets isolated git worktree
same-dir--spawn, --capacityAll sessions share cwd (can stomp)

Multi-session capacity: Up to 32 concurrent sessions (SPAWN_SESSIONS_DEFAULT = 32), gated by tengu_ccr_bridge_multi_session.

2. REPL Bridge (/remote-control command in interactive mode)

replBridge.ts (2,407 lines) — in-process bridge for interactive sessions:

User types /remote-control in REPL
→ initBridgeCore() → register environment → create session
→ Poll loop: wait for web user to type
→ Forward messages bidirectionally via transport
→ History flush: send existing conversation to web UI

Transport Generations

v1: HybridTransport (WebSocket + POST)

typescript
// Session-Ingress layer
// Reads: WebSocket connection to session-ingress URL
// Writes: HTTP POST to session-ingress URL
// Auth: OAuth access token

v2: SSE + CCRClient

typescript
// CCR (Claude Code Runtime) layer
// Reads: SSETransport → GET /worker/events/stream
// Writes: CCRClient → POST /worker/events (SerialBatchEventUploader)
// Auth: JWT with session_id claim (NOT OAuth)
// Heartbeat: PUT /worker (CCRClient built-in, 20s default)

The v2 path adds:

  • Worker registration: registerWorker() → server assigns epoch number
  • Epoch-based conflict resolution: 409 on stale epoch → close transport, re-poll
  • Delivery tracking: reportDelivery('received' | 'processing' | 'processed')
  • State reporting: reportState('running' | 'idle' | 'requires_action')

v3: Env-Less Bridge (remoteBridgeCore.ts)

typescript
// Direct OAuth → worker_jwt exchange, no Environments API
// 1. POST /v1/code/sessions → session.id
// 2. POST /v1/code/sessions/{id}/bridge → {worker_jwt, expires_in, worker_epoch}
// 3. createV2ReplTransport → SSE + CCRClient
// No register/poll/ack/stop/heartbeat/deregister lifecycle

Gated by tengu_bridge_repl_v2. Eliminates the entire poll-dispatch layer for REPL sessions.


The Poll-Dispatch Loop

bridgeMain.ts implements a sophisticated work-polling loop:

Poll Interval Configuration (GrowthBook-driven)

typescript
// Live-tunable via GrowthBook (refreshes every 5 min)
const pollConfig = getPollIntervalConfig()
// Intervals:
//   not_at_capacity: fast polling for new work
//   partial_capacity: moderate polling (some sessions active)
//   at_capacity: slow/heartbeat-only (all slots full)

Heartbeat Modes

When at capacity, the bridge can operate in two heartbeat patterns:

  1. Non-exclusive heartbeat (non_exclusive_heartbeat_interval_ms > 0): Heartbeat loop runs independently of polling, with optional at-capacity polling at a slower rate
  2. At-capacity polling only (at_capacity > 0): Slow poll doubles as liveness signal
  3. At-capacity wake: When a session completes, capacityWake.wake() interrupts the sleep so the bridge can immediately accept new work

Error Recovery

typescript
// 源码位置: src/bridge/bridgeMain.ts:320-340
const DEFAULT_BACKOFF: BackoffConfig = {
  connInitialMs: 2_000,
  connCapMs: 120_000,      // 2 minutes max backoff
  connGiveUpMs: 600_000,   // 10 minutes → give up
  generalInitialMs: 500,
  generalCapMs: 30_000,
  generalGiveUpMs: 600_000,
}

The bridge distinguishes connection errors (network down) from general errors (server 500s) and applies independent exponential backoff with separate give-up timers.


Session Runner

sessionRunner.ts (551 lines) wraps child process management:

typescript
// Child spawn command:
claude --print \
  --sdk-url <session_url> \
  --session-id <id> \
  --input-format stream-json \
  --output-format stream-json \
  --replay-user-messages

Activity Tracking

The bridge parses child stdout NDJSON to extract real-time activity summaries:

typescript
const TOOL_VERBS: Record<string, string> = {
  Read: 'Reading', Write: 'Writing', Edit: 'Editing',
  Bash: 'Running', Glob: 'Searching', Grep: 'Searching',
  WebFetch: 'Fetching', Task: 'Running task',
}

Token Refresh via Stdin

typescript
// Send fresh JWT to child process via stdin
handle.writeStdin(JSON.stringify({
  type: 'update_environment_variables',
  variables: { CLAUDE_CODE_SESSION_ACCESS_TOKEN: token },
}) + '\n')

The child's StructuredIO handles update_environment_variables by setting process.env directly — so getSessionIngressAuthToken() picks up the new token on the next API call.


JWT Lifecycle & Crash Recovery

Token Refresh Scheduling

typescript
const tokenRefresh = createTokenRefreshScheduler({
  refreshBufferMs: 5 * 60_000,  // 5 min before expiry
  getAccessToken: async () => { /* OAuth refresh */ },
  onRefresh: (sessionId, oauthToken) => {
    // v1: deliver OAuth token directly to child
    // v2: call reconnectSession() to trigger server re-dispatch
  },
})

v2's JWT approach introduces a subtle difference: each /bridge call bumps the server-side epoch. A JWT-only swap would leave the old CCRClient heartbeating with a stale epoch (→ 409 within 20s), so the entire transport must be rebuilt.

Crash Recovery Pointer

typescript
// Written after session creation (crash-recovery trail)
await writeBridgePointer(dir, {
  sessionId: currentSessionId,
  environmentId,
  source: 'repl',  // vs 'standalone'
})
// Cleared on clean teardown (unless perpetual mode)
// Perpetual mode: pointer survives clean exits for daemon restart

On restart, the bridge reads the pointer and attempts reconnection:

  1. Strategy 1: Idempotent re-register with reuseEnvironmentId → if same env returned, reconnectSession() re-queues existing session
  2. Strategy 2: If env expired (laptop slept >4h), archive old session → create fresh session on new environment

Environment Reconnection Budget

typescript
const MAX_ENVIRONMENT_RECREATIONS = 3
// After 3 consecutive failures, give up

Message Protocol

Outbound (CLI → Server)

Messages flow as NDJSON lines on the child's stdout, parsed by the bridge and forwarded via the transport layer:

Message TypeMeaning
assistantModel response (text blocks, tool_use blocks)
resultTurn completed (success/error)
control_requestPermission prompt (can_use_tool)

Inbound (Server → CLI)

typescript
// handleIngressMessage() processes incoming messages
// - Echo dedup: recentPostedUUIDs (BoundedUUIDSet, cap=2000)
// - Inbound dedup: recentInboundUUIDs (BoundedUUIDSet, cap=2000)
// - Control requests: model changes, permission mode, interrupt

FlushGate

typescript
// Queue live writes during history flush to preserve ordering
const flushGate = new FlushGate<Message>()
// Start: queue new messages
// End: drain queued → send, then resume direct writes
// Drop: discard queue (transport dead)

The Permission Pipeline

When the child CLI needs tool approval:

Child stdout → { type: 'control_request', subtype: 'can_use_tool', tool_name, input }
→ Bridge forwards via transport → Server → claude.ai shows approval UI
→ User clicks approve/deny
→ Server sends control_response → Bridge transport → child stdin
→ reportState('running') // Clear the "waiting for input" indicator

The bridge also handles server-initiated control requests:

  • set_model: Change the model mid-session
  • set_max_thinking_tokens: Adjust thinking budget
  • set_permission_mode: Switch between permission modes (auto, bypassPermissions)
  • api_key_interrupt: Interrupt the current turn

Transferable Design Patterns

The following patterns can be directly applied to other remote-control or distributed agent systems.

Pattern 1: Epoch-Based Conflict Resolution

Scenario: Multiple workers may compete for the same session after reconnection. Practice: Assign a monotonically increasing epoch on registration; reject stale-epoch requests with 409. Claude Code application: v2 transport uses worker_epoch — a 409 triggers transport teardown and re-poll.

Pattern 2: CapacityWake (AbortController-Based Sleep Interrupt)

Scenario: A bridge sleeps during at-capacity wait but needs to react instantly when a slot opens. Practice: Use an AbortController signal to interrupt the sleep timer. Claude Code application: capacityWake.wake() interrupts the poll sleep so the bridge accepts new work immediately.

Pattern 3: Bootstrap-Isolation via Callback Injection

Scenario: A subsystem (bridge) must avoid importing the main module tree to keep its bundle small. Practice: Inject dependencies as callbacks instead of direct imports. Claude Code application: createSession is injected as (opts) => Promise<string | null> to avoid pulling the entire REPL tree into the Agent SDK bundle.


Component Summary

ComponentLinesRole
bridgeMain.ts3,000Standalone bridge: poll loop, multi-session, worktree, backoff
replBridge.ts2,407REPL bridge: env registration, session create, transport management
remoteBridgeCore.ts1,009Env-less bridge: direct OAuth→JWT, no poll-dispatch layer
sessionRunner.ts551Child process spawning, NDJSON parsing, activity tracking
replBridgeTransport.ts371Transport abstraction: v1 (WS+POST) vs v2 (SSE+CCRClient)
types.ts263Protocol types: WorkResponse, SessionHandle, BridgeConfig
bridgeApi.ts~500HTTP client: register, poll, ack, stop, heartbeat, reconnect
bridgeMessaging.ts~430Message handling: ingress/egress, dedup, control requests
bridgeUI.ts~460Live terminal display: status, QR codes, multi-session bullets
jwtUtils.ts~260Token refresh scheduler: proactive 5min-before-expiry refresh
workSecret.ts~130Work secret decoding: JWT extraction, SDK URL construction

Total bridge surface: ~11,700 lines of protocol orchestration.


← Episode 12 — Startup & Bootstrap · Episode 14 — UI & State Management →

Released under the MIT License.