Pillar IV: Data, AIOps, Infrastructure · § 11

Multi-agent architecture

A single LLM call does not reliably answer “What did we spend with vendor X last quarter, broken down by category, and is it within budget?” The question needs query planning, structured data access, policy retrieval, aggregation, and a final safety check. Multi-agent architecture is what we use to compose those skills without trying to cram them all into one prompt.

Context

The AI assistant was the forcing function. The product does not answer real customer questions with a single retrieval step. It needs to plan, query, retrieve, aggregate, and verify, and each of those is a different prompt with a different model class. Once we accepted this, the architecture question became how to coordinate those agents safely, not whether to use them.

Why multi-agent fits B2B finance

A representative query, “Total travel expenses for customer X in Q3, broken down by category, exceeding budget?”, decomposes into.

Planner agent. Parses the question, identifies the data sources needed.
SQL agent. Generates and executes a query against the OLAP database.
Policy retrieval agent. Looks up the tenant’s budget rules from the RAG corpus.
Aggregator agent. Combines results, formats the output, attaches citations.
Guardrail agent. Verifies the output does not leak cross-tenant data or hallucinate.

Each agent is a separate LLM call with a focused prompt. Coordination happens through a state graph the framework manages.

Input

User query

Step 1 · Orchestration

Planner agent

Decomposes the query, routes to specialist agents

Specialist

SQL agent

Structured queries

Specialist

RAG agent

Document retrieval

MCP layer

Read-only

OLAP database

Read-only

Vector database

Step 2 · Synthesis

Aggregator agent

Merges results from specialist agents

Step 3 · Safety

Guardrail agent

Verifies no cross-tenant leak or hallucination

Output

Response to user

A2A Agent-to-agent messages MCP Agent-to-tool access

Multi-agent workflow. Planner routes, specialists run, aggregator merges, guardrail clears.

Agent-to-Agent communication (A2A)

Within a workflow, agents do not call each other through arbitrary HTTP. They communicate through the Agent-to-Agent Protocol (A2A), an open standard for structured inter-agent messaging. A2A defines the wire format, identity model, and capability advertisement that lets one agent invoke another with explicit message types, validated schemas, and traceable provenance.

Identity. Every agent has a signed identity. Receiver verifies the caller before processing a message.
Capability advertisement. Each agent publishes the message types it accepts. Callers cannot invoke methods that are not advertised.
Schema validation. Every A2A message is validated against the receiver’s schema before the agent prompt sees the payload.
Trace continuity. The A2A envelope carries the parent trace ID, so a workflow’s full call chain is reconstructable in the observability layer.

A2A is the inter-agent equivalent of what MCP (§12) is for agent-to-tool access. Together they bound how an agent can act. MCP controls what tools an agent reaches. A2A controls which other agents it talks to and how.

Framework choice is implementation detail

We evaluate and use multiple multi-agent orchestration frameworks depending on the workflow shape. Graph-based state management suits well-structured workflows. Conversation-based orchestration suits more exploratory tasks. The specific framework is an implementation detail. A2A and MCP are protocol-level commitments. The governance principles in this section apply regardless of which framework executes the graph.

What multi-agent gives us

Specialization. Each agent has a focused prompt and a smaller context window, which reduces failure rate per call.
Composability. Complex workflows are built from atomic agents reused across features.
Robustness. A single agent failure does not fail the whole workflow. We retry or fall back at the agent level.
Auditability. Each agent produces its own trace, which makes post-hoc debugging dramatically easier than dissecting one monolithic call.

What multi-agent costs us

Recursion risk. Agent A calls B, B calls C, C calls A. Infinite loop. §13 covers the hard caps preventing this.
State complexity. Workflows with shared state are harder to debug than stateless calls.
Cost explosion. Many LLM calls per query means costs scale faster than single-call architectures.
Larger attack surface. Every agent is a separate prompt injection target (Pillar V §3, §9).

A concrete example, the AI assistant

When a user asks “How much did we spend on vendor X last month?”, the workflow is.

Planner identifies this as analytical and routes to the SQL pipeline.
SQL Agent builds the query through the MCP layer (§12) and runs it against a read-only OLAP slot.
Result formatter structures the output and attaches citations linking back to source transactions.
Guardrail agent verifies the response stays within the user’s tenant.
The user sees the answer with expandable citations.

Every step is logged through the observability layer, which is what makes the trace usable for both debugging and audit.