Orchestrating teams of agents: multi-agent systems for real business work

Most teams' mental model of an agent is still a single assistant in a chat window: you ask, it answers, you copy the result somewhere useful. That model has a ceiling. Real business work is rarely one prompt—it is research, then drafting, then checking, then formatting, then routing to the right human. The moment you try to make one agent do all of that in a single pass, quality drops and the prompt becomes an unmaintainable wall of instructions.

The next step up is a multi-agent system: an orchestrator that decomposes a goal and routes sub-tasks to specialized agents, each with a narrow job, its own tools, and its own permissions. This is where agent-driven work starts to resemble how organizations already operate—division of labor, hand-offs, and review—rather than a single overworked generalist.

This post is about that pattern: when it earns its complexity, the new failure modes it creates, and the security model that has to expand with it. For the single-agent foundations, start with what is agent-driven development? and the business task scorecard. For the identity questions multi-agent work amplifies, pair this with agent identity and access.

One agent or many? Resist the upgrade until you need it

Multi-agent systems are not a maturity badge. They add latency, cost, and surface area. Reach for them only when a single agent genuinely struggles, which usually shows up as one of these signals:

The prompt is doing three jobs at once and you cannot improve one without breaking another (research quality vs. tone vs. format).
Different sub-tasks need different tools or permissions—a researcher needs read access to a knowledge base; a publisher needs write access to a ticketing system. Cramming both into one identity violates least privilege.
You want independent verification—a separate "critic" agent that checks the drafter's work against a rubric catches errors a self-grading single agent will rationalize away.
Steps are genuinely parallel—fanning out research across ten accounts is faster as ten scoped workers than one sequential loop.

If none of those apply, a single well-scoped agent with good tools is cheaper and easier to govern. The best multi-agent system is the smallest one that solves the problem.

The patterns worth knowing

A handful of compositions cover most real use cases:

Pattern	Shape	Good for	Main risk
Orchestrator–worker	A planner decomposes the goal, delegates to specialists, assembles results	Multi-step business workflows with distinct skills	Orchestrator becomes a confused deputy with too much authority
Pipeline	Fixed sequence: research → draft → critique → format	Predictable document production	Errors compound silently down the chain
Critic / reviewer	A second agent grades the first against a rubric	Quality gates, reducing confident-but-wrong output	Reviewer collusion—both agents share the same blind spot
Parallel fan-out	Identical workers run on partitioned inputs	High-volume, embarrassingly parallel tasks	Credential sprawl, rate limits, partial failures

Notice that none of these remove the human. They change where the human sits: from editing every draft to approving the orchestrator's plan, spot-checking the critic's verdict, and owning the final send—the same accountability boundary described in agents in the business loop.

Business use cases that justify the coordination

Multi-agent shines when the work is decomposable and the hand-offs are clean:

Account research packs: a planner spawns scoped researchers (CRM history, public filings, recent news), a synthesizer merges them into one brief with a "confidence and gaps" section, and a seller still owns the outreach.
RFP and questionnaire responses: a retriever pulls candidate answers from your approved answer library, a drafter assembles, a critic flags any claim not backed by a cited source, and a human signs.
Competitive and market monitoring: parallel workers watch different sources on a schedule, an aggregator clusters and dedupes, and the result lands in a draft queue—never auto-published. (See always-on agents for the trigger side of this.)
Engineering: mechanical migrations: a planner splits a large refactor by module, workers draft per-module diffs, a reviewer agent runs the PR review checklist, and a human approves each PR before merge.

The unifying trait: inputs, output shape, and acceptance criteria are specifiable, even if the criteria are checklists rather than unit tests. That is exactly the work agents are unusually good at.

New failure modes you do not get with one agent

Coordination is the feature and the liability. Budget for these:

Error propagation: a small mistake in step one becomes a confident, fully-formatted wrong answer by step four. Put validation between stages, not just at the end.
Orchestrator over-reach: a planner that holds every credential and can call every tool is a single point of catastrophic failure—the classic confused deputy. Scope the orchestrator's authority to routing, not acting.
Loops and runaway cost: agents that can re-delegate to each other can spin. Enforce hard step budgets, depth limits, and per-run spend caps.
Reviewer collusion: if your critic shares the drafter's prompt, model, and context, it will share its blind spots. Give the critic a different rubric and, where it matters, a different model.
Lost provenance: when five agents touch an artifact, "who decided this?" gets murky. Tag every intermediate output with the producing agent, its inputs, and the template version.

Security: every agent is an identity, every hand-off is a trust boundary

Multi-agent systems multiply the things that already make single agents risky. The discipline from agent identity and access and guardrails for agent-assisted coding does not just carry over—it becomes load-bearing.

Distinct identities, least privilege per role. Do not run a fleet on one shared key. The researcher gets read-only access to the knowledge base; the publisher gets scoped write access to one system and nothing else. If a worker is compromised, its blast radius is its own narrow scope—not the whole pipeline.

Treat inter-agent messages as untrusted input. The output of one agent is the input to the next, which means prompt injection now spreads laterally: a poisoned web page read by a researcher can carry instructions that hijack the drafter downstream. Apply the same defenses as prompt injection: the agent attack surface at every hand-off—structured, validated payloads between agents rather than free-form text that the next agent will dutifully obey.

Keep the orchestrator dumb about secrets. The planner should know what needs to happen and which worker does it—not hold the credentials to do everything itself. Privilege lives with the specialist, gated and audited.

Human gates on irreversible actions, always. No chain of agents should send customer email, move money, change production config, or grant access without an explicit human approval. Multi-agent makes it tempting to let the pipeline "finish the job." Don't.

Forensics that survive a multi-actor run. Log per-agent: inputs (or hashes for sensitive data), tools called, outputs, and the run/trace id that ties the team together. After an incident you must be able to answer "which agent, reading what, decided to do that"—without storing more sensitive content than your retention policy allows.

Get security involved when you design the first orchestration, not when an auditor finds an undocumented agent mesh holding broad credentials. Done early, multi-agent systems are defensible leverage. Done late, they are shadow IT with an org chart.

A pragmatic path to your first multi-agent workflow

You do not need a platform team to start:

Ship the single-agent version first. Only split it when a specific signal above forces your hand. Measure the single-agent baseline so you can prove the upgrade was worth it—use the ROI model.
Split along permission boundaries, not vibes. If two steps need different access, they are two agents.
Add a critic before you add more workers. Independent verification buys more quality than raw parallelism.
Put a human at the plan-approval and final-send gates, and a kill switch on the whole run.
Instrument provenance from day one, so your fifth workflow is auditable instead of mysterious.

Why this is where agent-driven work is heading

Software already got here. We stopped writing monoliths and started composing small services with clear contracts, their own permissions, and observability between them. Agent-driven work is following the same arc: from one generalist assistant to composed teams of specialized agents with scoped identities, validated hand-offs, and humans on the risk boundary.

In a few years, "do you use an AI assistant?" will sound as quaint as "do you have a website?" The serious questions will be the ones mature engineering orgs already ask about distributed systems: what is the contract between components, who holds which privilege, and how do we trace a failure across the whole mesh? Teams that bring that discipline to agent orchestration now will make multi-agent work ordinary. Teams that bolt agents onto each other without identities, gates, or provenance will relearn—loudly—every lesson distributed systems already taught us.

If you want help designing agent workflows—single or orchestrated—under one coherent security and accountability model, tell us about your constraints or read the consulting offer.