THE AGENT TEAM PROBLEM: SEVEN OPEN-SOURCE APPROACHES TO ORCHESTRATING AI CODING AGENTS

One agent is useful. Multiple agents working in parallel on the same codebase is an unsolved engineering problem. We performed a source-level analysis of seven open-source projects — OrbitDock, codex-autorunner, Squad, Open-Inspect, Multica, Symphony, and AionUi — to map the architectural design space of AI agent orchestration. What emerged are four fundamental disagreements about how humans and AI should collaborate on code.

April 2, 2026·Liu Wei

Agent OrchestrationMulti-Agent SystemsOpen SourceSource AnalysisAI EngineeringGit Isolation

In February 2026, a single AI coding agent running in a terminal was sufficient for most tasks. By April 2026, that assumption no longer holds. Codebases have grown. Tasks have become compound. And the bottleneck has shifted from "can the model write code" to "can multiple model instances work on the same codebase without destroying each other's changes."

This is the agent team problem. It is not a model intelligence problem — GPT-4o, Claude Opus, and Gemini 2.5 Pro are all capable code generators. It is an orchestration engineering problem: how do you isolate execution environments, manage shared state, handle failures, route communication between agents, and keep a human informed without requiring babysitting?

We performed a source-level analysis of seven open-source projects that have each taken a distinct approach to this problem. These are not toy demos — several have thousands of GitHub stars, active development communities, and production usage. They span six programming languages (Rust, Elixir, Go, TypeScript, Python, Swift), five execution isolation strategies, three persistence models, and at least four fundamentally different philosophies about the relationship between humans and AI agents.

The seven projects, listed by their GitHub star count as of April 2, 2026: AionUi (github.com/iOfficeAI/AionUi, 20,700+ stars) — a multi-AI-agent unified GUI client built on Electron with support for numerous agent backends including Gemini CLI, Claude Code, Codex, and others. Symphony (github.com/openai/symphony, 14,400+ stars) — OpenAI's autonomous coding agent daemon built in Elixir/OTP with Linear integration. Squad (github.com/bradygaster/squad, 1,600+ stars) — "one command to have an AI dev team" via GitHub Copilot agent framework. Open-Inspect (github.com/ColeMurray/background-agents, 1,400+ stars; the repository is named "background-agents" but the project identifies itself as inspired by Ramp's Inspect system, hence our shorthand) — a cloud-native background coding agent platform on Cloudflare Workers and Modal. Multica (github.com/multica-ai/multica, 800+ stars) — an AI-native project management tool (described as "Linear, but with AI agents as first-class team members") built in Go and Next.js with PostgreSQL. codex-autorunner (github.com/Git-on-my-level/codex-autorunner, 600+ stars) — a meta-orchestrator that eliminates the "agent babysitting problem" via ticket-driven state machines. OrbitDock (github.com/Robdel12/OrbitDock, 87 stars) — a "space mission control" for AI coding agents built in Rust with native SwiftUI clients.

This article is structured as a technical comparison based on our reading of each project's public repository as of April 2, 2026. Architectural claims are grounded in README documentation, source file structure, and code inspection. Where our access to internals is limited — particularly around performance benchmarks, production reliability, and private roadmaps — we say so explicitly.

The first architectural question every multi-agent system must answer is: how do you prevent agents from trampling each other's work? If two agents modify the same file simultaneously, the result is conflict, corruption, or silent data loss. The seven projects have reached five different answers.

Agent Control Spectrum — from bidirectional real-time steering to autonomous fire-and-forget execution — Fig. 1 — Agent control spectrum. Seven projects mapped by the degree of human control over agent execution, from OrbitDock's mid-turn steering to Open-Inspect's prompt queue.

Git worktree isolation is the most common approach. OrbitDock creates a worktree per Linear issue (source: the Rust backend's issue handling logic). codex-autorunner creates a worktree per ticket. Multica uses git worktrees for parallel development, with each task receiving an isolated working directory. The daemon polls the server at a configurable interval (default 3 seconds) for claimed tasks, creates the isolated workspace, spawns the agent CLI, and streams results back. Heartbeats are sent every 15 seconds. Its CONTRIBUTING.md documents a first-class worktree development workflow where each checkout uses a unique database derived from the worktree path hash.

Symphony makes a different choice: full git clones rather than worktrees. Each issue gets an independent repository copy. This is more expensive in disk space and clone time, but provides stronger isolation — a worktree shares the object database with its parent, meaning a corrupted object can affect all worktrees.

Cloud sandboxing is Open-Inspect's approach. The platform runs agent workloads in Modal's cloud containers, with snapshot-based session recovery. This provides OS-level isolation — agents cannot affect each other's environments — but introduces network latency and cloud dependency.

AionUi takes a fundamentally different approach because it is not a git orchestration tool. It isolates agents at the process level — each agent runs in a forked worker process — but does not manage repository state at all. If two AionUi agents edit the same file, the user resolves the conflict manually.

The agent control model is perhaps the most revealing architectural choice because it reflects a project's philosophy about human-AI collaboration.

At the bidirectional end of the spectrum, OrbitDock implements mid-turn steering: the human can redirect an agent while it is executing, not just before or after. This requires a persistent WebSocket connection between the Rust server and the SwiftUI client, with the server maintaining a real-time state machine for each agent session.

Squad takes a conversational routing approach. Team members are persistent entities with names, personas, and specialized knowledge. The user interacts through natural language, and the coordinator routes tasks to the appropriate specialist. Squad integrates with GitHub Copilot's agent framework, with each team member modeled as a distinct agent within the Copilot ecosystem.

Multica implements a daemon-driven streaming model. The Go backend daemon supports up to 20 concurrent tasks (configurable), with a 2-hour default agent timeout. Agents are assigned through the Multica UI or CLI (e.g., multica issue assign), and the daemon streams results back to the server in real time.

AionUi provides interactive chat with an approval mechanism. The ApprovalStore (source: src/process/agent/acp/ApprovalStore.ts) caches human approval decisions, so the user does not need to re-approve identical operations. This is a pragmatic middle ground — agents have autonomy for previously-approved actions but require explicit confirmation for novel ones.

AionUi — the most starred project in this survey at 20,700+ stars.

codex-autorunner uses a dispatch/reply pattern. The orchestrator dispatches a ticket to an agent and waits for a reply. Communication is bidirectional but asynchronous — the human can intervene between tickets, but not during execution. The project's README describes this as being "bitter-lesson-pilled": minimal constraints on agent behavior, treating models as the execution layer rather than constraining them.

Symphony runs agents in multi-turn mode: each worker can execute up to 20 turns per run (configurable via agent.max_turns in WORKFLOW.md). After each turn completion, the worker re-checks the Linear issue state — if the issue remains active, the next turn continues on the same live coding-agent thread in the same workspace. The first turn uses the full rendered task prompt; continuation turns send only continuation guidance, since the original prompt is already in thread history (source: SPEC.md section 6.4). After clean exit, the orchestrator schedules a 1-second continuation retry to check if the issue needs another session.

Open-Inspect queues prompts for background execution with optional follow-up. Agents run in cloud containers, and results are delivered asynchronously through Slack, a web dashboard, or a Chrome extension.

Every multi-agent system must persist some state. The question is: how much, and where?

Symphony makes the most radical choice: zero persistence. All agent state lives in memory. If the Elixir process restarts, the system re-synchronizes from Linear — pulling the current state of all issues and re-deriving the execution queue. This is the Erlang "let it crash" philosophy applied directly: if state is derivable from an external source of truth, don't persist it locally. The trade-off is that a restart during a complex multi-turn execution loses all in-progress context.

At the opposite extreme, Multica implements full relational persistence in PostgreSQL with 30 SQL migration files (source: server/internal/db/migrations/). Agent sessions, issue assignments, token usage, repository metadata — everything is stored in typed, indexed tables generated by sqlc. This is the most "enterprise" approach: queryable, auditable, and completely recoverable from any failure. The cost is operational complexity — you need a PostgreSQL instance, migration management, and backup infrastructure.

codex-autorunner uses a three-layer state root: SQLite for structured data, JSON for configuration, and YAML frontmatter in ticket files for human-readable task definitions. This hybrid approach means parts of the state are git-trackable (the YAML tickets) while other parts (SQLite) are not.

Squad stores state in a .squad/ directory containing team.md (roster), routing.md (task routing rules), decisions.md (shared team decisions), ceremonies.md (sprint ceremonies), a casting/ directory with persistent name registries, and per-agent directories under agents/{name}/ with charter.md (identity and expertise) and history.md (project-specific learnings). A dedicated scribe agent silently manages memory. This is fully git-trackable — you can commit your team's knowledge, review it in a pull request, and share it across machines.

AionUi uses SQLite in WAL (Write-Ahead Logging) mode with 18 migration files and a StreamingMessageBuffer (source: src/process/services/database/StreamingMessageBuffer.ts) for batched write optimization. WAL mode allows concurrent reads during writes, which is important for a desktop application where the UI must remain responsive while agents produce output.

Open-Inspect distributes state across three layers: Cloudflare Durable Objects (per-session, each containing SQLite DB, WebSocket Hub, Event Stream, and GitHub Integration), Cloudflare D1 (repo-scoped secrets), and Modal filesystem snapshots (execution state, rebuilt every 30 minutes). Sandboxes are warmed proactively when a user starts typing. The system supports multiplayer sessions — multiple users can collaborate with presence indicators and per-prompt commit attribution via configureGitIdentity(). This is the most complex persistence architecture in the survey, reflecting the challenges of cloud-native distributed systems.

Where does work come from? The answer reveals whether a project is built for individual developers or for teams with established workflows.

OrbitDock and Symphony depend on external trackers — specifically Linear. Symphony polls Linear via GraphQL at regular intervals, looking for issues in states that indicate "ready for agent work." This means your task management workflow must already include Linear. OrbitDock connects to both Linear and GitHub Issues, offering more flexibility.

Squad — AI agent teams via GitHub Copilot.

Multica replaces the external tracker entirely. It is its own project management system — issues, assignments, comments, and status tracking are all built in. Agents are assigned issues the same way human teammates would be: through the Multica UI or CLI. This is the highest-friction approach (you must adopt Multica as your tracker) but also the most integrated.

codex-autorunner uses local YAML ticket files with structured frontmatter. Each ticket is a Markdown file with metadata — assigned agent, priority, dependencies, and current state. This is the most "code-as-configuration" approach: tasks are version-controlled, reviewable, and editable with any text editor.

Squad and Open-Inspect accept natural language: Squad through conversational input or GitHub Issues, Open-Inspect through a web interface or Slack messages. This is the lowest barrier to entry — no special format, no external tools.

AionUi is purely conversation-based. There is no task management layer — you chat with agents, and they execute. This works well for ad-hoc tasks but lacks the structure needed for coordinated multi-agent workflows.

Beyond the core architecture, several capability dimensions differentiate these projects. We highlight the most technically interesting ones.

Provider and model support varies dramatically. AionUi supports numerous agent backends through the ACP (Agent Communication Protocol) — a JSON-RPC 2.0 protocol over stdio that provides a unified interface for communicating with Gemini CLI, Claude Code, Codex, OpenCode, Qwen Code, Goose CLI, Auggie, and others (source: src/process/agent/acp/AcpConnection.ts). Multica supports Claude and Codex, auto-detecting installed CLIs on PATH. Symphony is Codex-only, reflecting its OpenAI origin. Squad delegates to GitHub Copilot, which supports multiple model providers through its own abstraction layer.

Cost tracking approaches differ. Squad provides OpenTelemetry-based observability (initSquadTelemetry) with an .NET Aspire dashboard for tracing and metrics across agents. Symphony's optional HTTP status surface exposes aggregate token and runtime totals per issue via GET /api/v1/state (source: SPEC.md). Multica supports Claude and Codex, auto-detecting installed CLIs on PATH, with the daemon tracking task state and timeout enforcement.

Security and safety vary across the projects. Squad implements a HookPipeline with five built-in policies: ReviewerLockoutHook (agents cannot edit files they are reviewing), file guards for sensitive paths, shell command allowlists, rate limits (configurable maxCallsPerMinute), and PII filters that redact sensitive data before model calls (source: Squad SDK hooks documentation). AionUi's sandboxed extension system provides a formal permission model for third-party plugins. The remaining projects rely on the underlying agent runtimes for safety.

Failure handling is where Symphony's SPEC.md is most precise. The orchestrator implements exponential backoff with the formula delay = min(10000 × 2^(attempt-1), max_retry_backoff_ms) — starting at 10 seconds and capping at 300 seconds (5 minutes) by default. Normal continuation retries (clean exits) use a fixed 1-second delay. A stall_timeout_ms of 300,000 (5 minutes) kills unresponsive agents, and a per-turn timeout of 3,600,000ms (1 hour) prevents infinite runs (source: SPEC.md sections 5.3.5, 5.3.6, 8.4). codex-autorunner's state machine self-corrects against agents that "reward hack" (mark tickets done despite being incomplete) or scope-creep (create too many new tickets). Multica uses a heartbeat mechanism (15-second intervals) to detect stalled agents.

Skill injection allows agents to receive domain-specific instructions. Symphony stores skills as Markdown files in .codex/skills/ covering operations like commit, push, pull, and land (source: .codex/skills/). Multica injects per-provider skill instructions. Squad implements a marketplace model where skills can be discovered and installed.

The extension/plugin ecosystem is most developed in Squad (which has a marketplace and SDK) and AionUi (which has a sandboxed extension system with a formal permission model — source: src/process/extensions/sandbox/permissions.ts). codex-autorunner supports custom entry points. The remaining projects have no extension mechanism.

The seven projects fall into two broad categories: orchestration systems (OrbitDock, codex-autorunner, Squad, Multica, Symphony, Open-Inspect) that solve "how to manage and run AI agents," and client systems (AionUi) that solve "how to provide a unified interface to multiple AI agents."

Competitive positioning quadrant — seven projects mapped by human oversight depth vs. agent autonomy level — Fig. 2 — Competitive positioning. Projects mapped by human oversight depth (vertical) against agent autonomy level (horizontal). Circle size indicates relative ecosystem complexity.

Within the orchestration category, the projects diverge further along four philosophical axes.

First: is an agent a tool or a role? Squad and Multica treat agents as identifiable team members — with names, avatars, personas, and skill profiles. This creates a richer mental model (you interact with "the frontend specialist" rather than "agent-3") but increases system complexity. Symphony treats agents as stateless workers — no identity, no memory, just execution capability. AionUi is in between: agents have names and capabilities but no persistent identity across sessions.

Second: platform or product? codex-autorunner and Squad are building platforms — with plugin SDKs, template ecosystems, and extension marketplaces. Multica and Open-Inspect are building SaaS products — multi-tenant, multi-team, with managed infrastructure. OrbitDock and Symphony are building focused tools — single-purpose, minimal extensibility, deep integration with one workflow.

Third: local or cloud? OrbitDock, codex-autorunner, Squad, Multica, and AionUi run primarily on the developer's machine. Open-Inspect runs entirely in the cloud (Cloudflare + Modal). Symphony's reference Elixir implementation runs locally, but its SPEC.md is explicitly language-agnostic — the README literally says "tell your favorite coding agent to build Symphony in a programming language of your choice." The spec defines six abstraction layers (Policy, Configuration, Coordination, Execution, Integration, Observability), eight components, and a five-state internal state machine (Unclaimed, Claimed, Running, RetryQueued, Released).

Fourth: stateful or stateless? This axis determines operational complexity. Symphony's zero-persistence model is the simplest to operate but the hardest to debug. Multica's PostgreSQL model is the most complex to operate but the most auditable. Squad's file-based persistence is the best balance of simplicity and visibility — everything is in .squad/, readable, committable, and diff-able.

Each project has made technical bets that constrain its future. Understanding these bets is essential for evaluating which approach fits a given use case.

OrbitDock bets on native applications. The SwiftUI client means iOS and macOS only — no Windows, no Linux, no web. The Rust server is cross-platform, but the primary interface is Apple-native. This limits the addressable audience but enables a level of polish and responsiveness that web apps cannot match.

Symphony bets on Elixir/OTP. The BEAM virtual machine provides lightweight processes, fault tolerance through supervisors, and hot code reloading. For an agent orchestrator — which is fundamentally a concurrent process manager — OTP is a natural fit. But the Elixir ecosystem is small compared to Node.js or Python, which may limit contribution and adoption.

Multica bets on Go for the backend. The Chi router, sqlc for type-safe SQL, gorilla/websocket — this is a conventional Go web service stack. Go's concurrency model (goroutines + channels) maps well to agent orchestration. The choice of PostgreSQL with pgvector hints at future embedding-based features.

Symphony demo screenshot — Symphony by OpenAI — the highest-profile entry at 14,400+ stars.

AionUi bets on Electron. At approximately 120,000 lines of code, it is the largest codebase in this survey. Electron provides cross-platform desktop support and access to Node.js APIs, but carries the well-known costs of memory usage and startup time. The project partially mitigates this with a WebUI mode that runs in a browser.

Squad bets on the GitHub Copilot ecosystem. If GitHub Copilot's agent framework thrives, Squad thrives. If GitHub changes the API or agent model, Squad must adapt. This is the strongest platform dependency in the survey.

Open-Inspect bets on serverless edge computing. Cloudflare Durable Objects provide per-session state with global distribution. Modal provides on-demand GPU/CPU containers. This architecture scales to zero and handles burst traffic, but introduces cold-start latency and vendor lock-in.

codex-autorunner bets on filesystem-as-data-plane. Everything is files — tickets are Markdown, state is SQLite and JSON, agent output is captured to disk. This means the entire system is inspectable with standard Unix tools (cat, grep, sqlite3). It is the most debuggable architecture in the survey, at the cost of limited real-time streaming capability.

Several capabilities are notably absent across all seven projects, representing open problems in agent orchestration.

No project implements a formal verification mechanism for agent output correctness. Agents produce code, but verifying that code does what was intended — beyond running tests — remains manual.

No project implements cross-repository orchestration. If a task spans multiple repositories (for example, updating an API server and its client SDK simultaneously), the user must coordinate manually. codex-autorunner's "Hub" concept and Squad's "Cross-Squad" feature point in this direction but are not yet implemented.

No project provides deterministic replay. If an agent run fails, you cannot re-run it with identical conditions to reproduce the failure. Non-deterministic model output, race conditions in tool execution, and external state changes all prevent reproducibility.

No project implements a cost-optimization layer that routes tasks to cheaper models when confidence is high. All projects use a fixed model per execution, regardless of task complexity. Squad's four-tier response classification — direct, lightweight, standard, and full, each with configurable maxAgents, defaultModel, and available toolset (source: Squad SDK reference) — is the closest approximation, routing simpler tasks to faster model tiers.

For teams evaluating these projects, we offer a decision framework based on primary constraints.

If you already use Linear for project management: Symphony is the natural choice — it treats Linear as the source of truth and requires no migration.

If you want a unified GUI for multiple AI agents: AionUi has the broadest agent support and the most mature desktop experience among the projects surveyed.

If you want AI as first-class team members in your project management: Multica replaces Linear entirely and integrates agent assignment with issue tracking.

If you need cloud-native scaling with no local infrastructure: Open-Inspect's Cloudflare + Modal architecture is the only fully serverless option.

If you want minimal tooling with maximum debuggability: codex-autorunner's filesystem-as-data-plane approach means everything is inspectable with cat and grep.

If you want persistent AI team members integrated with GitHub Copilot: Squad's team model and Copilot integration provide the richest developer workflow.

If you need native mobile oversight with code review capability: OrbitDock's SwiftUI client and magit-style code review provide the deepest mobile-native experience in this survey.

The seven projects analyzed here collectively represent over 39,000 stars, six programming languages, and hundreds of thousands of lines of code. Most gained significant traction in Q1 2026, reflecting a shared recognition that AI coding agents have reached the point where the orchestration problem is now the binding constraint.

The models can write code. The question is whether we can build the infrastructure to let them work together safely, efficiently, and under appropriate human oversight. These seven projects represent a diverse set of open-source approaches to that question. None has solved it completely. All have contributed architectural insights that advance the field.

We will be watching this space closely. The next six months will determine which orchestration patterns survive contact with production workloads.

References. Symphony SPEC.md: a 2,176-line language-agnostic specification for autonomous agent orchestration (github.com/openai/symphony/blob/main/SPEC.md). Multica architecture documentation: Go backend with Chi router, sqlc, and PostgreSQL 17 with pgvector (github.com/multica-ai/multica). AionUi ACP specification: JSON-RPC 2.0 protocol over stdio for unified multi-agent communication (github.com/iOfficeAI/AionUi, src/process/agent/acp/). Squad documentation: GitHub Copilot-based agent team framework (github.com/bradygaster/squad). OrbitDock architecture: Rust server with SwiftUI native clients (github.com/Robdel12/OrbitDock). codex-autorunner: ticket-driven state machine for agent orchestration (github.com/Git-on-my-level/codex-autorunner). Open-Inspect: Cloudflare Workers + Modal-based background agent platform (github.com/ColeMurray/background-agents). Claude Code source analysis: our prior article "The Harness That Makes the Model Useful" (duocodetech.com/blog/claude-code-harness-engineering) provides context on how single-agent harness engineering works, which this article extends to multi-agent orchestration.

Acknowledgments. This analysis would not be possible without the open-source contributions of: Robert Deluca (OrbitDock), the Git-on-my-level team (codex-autorunner), Brady Gaster (Squad), Cole Murray (Open-Inspect/background-agents), the Multica team (Multica), OpenAI (Symphony), and the iOfficeAI team (AionUi). All seven projects are open-source under permissive licenses as of this writing. We encourage readers to explore their repositories, verify our claims, contribute, and build upon their work. The agent orchestration problem is far from solved — and the more approaches we have, the faster we converge on answers.