Skip to content
Pillar IV: Data, AIOps, Infrastructure · § 11

Multi-agent architecture

A single LLM call does not reliably answer “What did we spend with vendor X last quarter, broken down by category, and is it within budget?” The question needs query planning, structured data access, policy retrieval, aggregation, and a final safety check. Multi-agent architecture is what we use to compose those skills without trying to cram them all into one prompt.

The AI assistant was the forcing function. The product does not answer real customer questions with a single retrieval step. It needs to plan, query, retrieve, aggregate, and verify, and each of those is a different prompt with a different model class. Once we accepted this, the architecture question became how to coordinate those agents safely, not whether to use them.

A representative query, “Total travel expenses for customer X in Q3, broken down by category, exceeding budget?”, decomposes into.

  • Planner agent. Parses the question, identifies the data sources needed.
  • SQL agent. Generates and executes a query against the OLAP database.
  • Policy retrieval agent. Looks up the tenant’s budget rules from the RAG corpus.
  • Aggregator agent. Combines results, formats the output, attaches citations.
  • Guardrail agent. Verifies the output does not leak cross-tenant data or hallucinate.

Each agent is a separate LLM call with a focused prompt. Coordination happens through a state graph the framework manages.

Input
User query
Step 1 · Orchestration
Planner agent
Decomposes the query, routes to specialist agents
Specialist
SQL agent
Structured queries
Specialist
RAG agent
Document retrieval
MCP layer
Read-only
OLAP database
Read-only
Vector database
Step 2 · Synthesis
Aggregator agent
Merges results from specialist agents
Step 3 · Safety
Guardrail agent
Verifies no cross-tenant leak or hallucination
Output
Response to user
A2A Agent-to-agent messages MCP Agent-to-tool access
Multi-agent workflow. Planner routes, specialists run, aggregator merges, guardrail clears.

Within a workflow, agents do not call each other through arbitrary HTTP. They communicate through the Agent-to-Agent Protocol (A2A), an open standard for structured inter-agent messaging. A2A defines the wire format, identity model, and capability advertisement that lets one agent invoke another with explicit message types, validated schemas, and traceable provenance.

  • Identity. Every agent has a signed identity. Receiver verifies the caller before processing a message.
  • Capability advertisement. Each agent publishes the message types it accepts. Callers cannot invoke methods that are not advertised.
  • Schema validation. Every A2A message is validated against the receiver’s schema before the agent prompt sees the payload.
  • Trace continuity. The A2A envelope carries the parent trace ID, so a workflow’s full call chain is reconstructable in the observability layer.

A2A is the inter-agent equivalent of what MCP (§12) is for agent-to-tool access. Together they bound how an agent can act. MCP controls what tools an agent reaches. A2A controls which other agents it talks to and how.

We evaluate and use multiple multi-agent orchestration frameworks depending on the workflow shape. Graph-based state management suits well-structured workflows. Conversation-based orchestration suits more exploratory tasks. The specific framework is an implementation detail. A2A and MCP are protocol-level commitments. The governance principles in this section apply regardless of which framework executes the graph.

  • Specialization. Each agent has a focused prompt and a smaller context window, which reduces failure rate per call.
  • Composability. Complex workflows are built from atomic agents reused across features.
  • Robustness. A single agent failure does not fail the whole workflow. We retry or fall back at the agent level.
  • Auditability. Each agent produces its own trace, which makes post-hoc debugging dramatically easier than dissecting one monolithic call.
  • Recursion risk. Agent A calls B, B calls C, C calls A. Infinite loop. §13 covers the hard caps preventing this.
  • State complexity. Workflows with shared state are harder to debug than stateless calls.
  • Cost explosion. Many LLM calls per query means costs scale faster than single-call architectures.
  • Larger attack surface. Every agent is a separate prompt injection target (Pillar V §3, §9).

When a user asks “How much did we spend on vendor X last month?”, the workflow is.

  1. Planner identifies this as analytical and routes to the SQL pipeline.
  2. SQL Agent builds the query through the MCP layer (§12) and runs it against a read-only OLAP slot.
  3. Result formatter structures the output and attaches citations linking back to source transactions.
  4. Guardrail agent verifies the response stays within the user’s tenant.
  5. The user sees the answer with expandable citations.

Every step is logged through the observability layer, which is what makes the trace usable for both debugging and audit.