EAAPL-MAG006 — Agent Handoff Protocol
Status: Proven
Tags: agent traceability audit-logging medium-complexity
Version: 2.0.0
Last Updated: 2026-06-12
1. Pattern Identity
| Field | Value |
|---|---|
| Pattern ID | EAAPL-MAG006 |
| Name | Agent Handoff Protocol |
| Category | Multi-Agent |
| Maturity | Proven |
| Complexity | Medium |
| Related Patterns | EAAPL-MAG001 · EAAPL-MAG002 · EAAPL-MAG003 · EAAPL-MAG004 |
2. Executive Summary
The Agent Handoff Protocol defines a structured, versioned message format for transferring context between agents in a multi-agent system. Context transfer is the highest-risk operation in agent pipelines: lost context causes agents to re-derive what was already established (wasting tokens and time); incomplete context causes agents to proceed on false assumptions; unsanitised context enables prompt injection attacks. This pattern specifies the canonical handoff message schema — covering task description, completed and remaining subtasks, current state, relevant context, constraints, cost tracking, conversation history, and tool call history — together with the protocols for receiver validation, rollback on failed handoff, context compression for large transfers, and the latency SLO that handoffs must meet. Implementing a consistent handoff protocol across all agent types in a system is foundational to traceability: every agent's execution can be reconstructed from the sequence of handoff records.
3. Problem Statement
3.1 Context
In multi-agent systems built without a formal handoff protocol, context transfer between agents is ad-hoc: each agent-to-agent boundary is implemented differently, validated inconsistently, and logged incompletely. The result is a class of bugs that only manifest at boundaries — the receiving agent misunderstands a field name, receives a context that is too large for its context window, receives partial results from a crashed upstream agent, or receives a payload that contains injected instructions from a malicious input.
3.2 Forces in Tension
- Completeness vs. context window limits. Full handoff of all upstream context ensures the receiving agent has everything it needs, but may exceed the model's context window or introduce prohibitive token costs.
- Validation thoroughness vs. latency. Deep validation of handoff messages (schema, completeness, safety) adds latency. Minimal validation is fast but allows errors to propagate.
- Structured schema vs. flexibility. A rigid schema ensures consistency but may not accommodate novel agent types. A flexible schema is extensible but leads to fragmentation.
- Audit completeness vs. storage cost. Retaining full handoff records enables complete trace reconstruction but can be expensive for high-volume systems.
3.3 Failure Modes Without This Pattern
Without a formal protocol, context loss at agent boundaries is invisible — the receiving agent simply produces lower-quality output without any error signal. Prompt injection via upstream agent output is undetected. Partial handoffs caused by upstream crashes result in the receiving agent proceeding with incomplete state. And reconstructing what happened during a multi-agent run is impossible without consistent records.
4. Solution
4.1 Handoff Flow
4.2 Handoff Audit Trail
5. Structure
5.1 Component Catalogue
| Component | Responsibility | Technology Options |
|---|---|---|
| Handoff Builder | Assembles handoff message from agent execution state | Library function in agent SDK |
| Context Compressor | Summarises old conversation turns; keeps recent verbatim | LLM summarisation, sliding window |
| Message Validator | Schema validation + completeness + safety scan on receiver side | JSON schema, safety classifier |
| Transport Layer | Delivers handoff message with durability guarantees | Redis Streams, AWS SQS, Temporal Signal, direct HTTP |
| Handoff Audit Log | Immutable record of every handoff for trace reconstruction | Append-only Postgres table, S3 + Athena |
| Rollback Handler | Restores the sending agent's state on handoff failure | Checkpoint store per agent execution |
5.2 Canonical Handoff Message Schema
{
"schemaVersion": "2.0",
"handoffId": "uuid-v4",
"taskId": "uuid-v4",
"parentHandoffId": "uuid-v4-or-null",
"fromAgent": {
"agentId": "legal-analysis-agent",
"agentVersion": "2.1.0",
"executionId": "uuid-v4"
},
"toAgent": {
"agentType": "risk-scoring-agent",
"agentVersion": ">=1.5.0"
},
"timestamp": "ISO-8601",
"taskDescription": "Score the liability risk of the extracted clauses",
"completedSubtasks": [
{
"subtaskId": "s1",
"description": "Extract limitation-of-liability clauses",
"result": { "clauses": [] },
"completedAt": "ISO-8601"
}
],
"remainingSubtasks": [
{ "subtaskId": "s2", "description": "Score each clause for liability risk" },
{ "subtaskId": "s3", "description": "Produce executive risk summary" }
],
"currentState": {
"extractedClauses": [],
"riskFlags": [],
"pendingDecisions": []
},
"relevantContext": [
{ "source": "Contract PDF 2026-06-11", "excerpt": "...", "relevanceScore": 0.92 }
],
"constraints": [
"returnJSON:true",
"maxResponseTokens:2000",
"scoringFramework:iso31000"
],
"costTracking": {
"costSpentSoFarUSD": 0.08,
"costBudgetRemainingUSD": 0.32,
"tokenSpent": { "prompt": 4200, "completion": 1800 }
},
"conversationHistorySummary": "Agent extracted 12 clauses from the contract. Clause 4.2 flagged as potentially unlimited liability. Clause 7.1 noted as industry-standard. [Full history compressed — last 3 turns verbatim below]",
"conversationHistoryVerbatim": [],
"toolCallHistory": [
{ "tool": "pdf_extract", "calledAt": "ISO-8601", "inputHash": "sha256:...", "outputTokens": 3200 }
],
"traceparent": "W3C-Trace-Context-header-value"
}
6. Behaviour
6.1 Receiver Validation
The receiving agent must validate the handoff message before proceeding. Validation is multi-layered:
Schema validation. Validate the JSON message against the canonical schema (JSON Schema draft 2020-12). Required fields must be present and of the correct type. Schema version must be compatible with the receiver's expected version. Reject with SCHEMA_INVALID if validation fails.
Completeness check. Verify that the completedSubtasks list is non-empty when expected (a handoff from an agent that did no work is a likely error). Verify that currentState contains the fields the receiving agent requires. Reject with INCOMPLETE_CONTEXT if required state fields are missing.
Safety scan. Run the conversationHistoryVerbatim and currentState fields through an input safety classifier. Detect prompt injection patterns (instructions embedded in apparent data fields). Reject with SAFETY_VIOLATION and route to the dead-letter queue.
Cost budget check. If costBudgetRemainingUSD is at or below zero, return BUDGET_EXHAUSTED without executing. The receiving agent is not responsible for the budget miscalculation of the sender.
Retry and rejection. Schema and completeness failures are retried (the sender may have a bug; retry with error details). Safety violations are never retried — they go to the dead-letter queue for human investigation.
6.2 Handoff Audit Logging
Every handoff must be logged to an immutable store before the receiving agent begins execution. The log entry contains the full handoff message (minus any PII fields that must be redacted per data policy). The log enables:
- Full trace reconstruction of a multi-agent run from the sequence of handoff records.
- Cost attribution: total tokens spent on a task computed from the
costTrackingfields across all handoff records. - Failure investigation: identify exactly which handoff carried the erroneous state that led to a bad output.
- Regulatory audit: demonstrate the full chain of reasoning and data that led to a consequential AI decision.
Log records are append-only and must not be modified or deleted within the retention period. For regulated use cases, apply a cryptographic hash chain: each record includes the hash of the previous record, making tampering detectable.
6.3 Rollback on Failed Handoff
If the handoff is rejected (validation failure) and retry is exhausted, or if the receiving agent fails immediately after accepting the handoff, the task must be rolled back to the state at the beginning of the handoff:
- The sending agent's execution checkpoint (saved before initiating the handoff) is restored.
- The task status reverts to
PENDING_HANDOFF. - An alert is emitted with the rejection reason.
- A human-escalation event is created for handoffs that fail 3+ times.
Rollback requires that the sending agent saved its checkpoint before dispatching the handoff message. This is mandatory — a sending agent that does not checkpoint before handoff cannot support rollback.
6.4 Context Compression
When the accumulated conversation history exceeds a configurable threshold (default: 80% of the receiving agent's context window), the handoff builder runs a compression step:
- Identify the oldest conversation turns (oldest first).
- Run an LLM summarisation call that produces a condensed narrative of those turns, preserving key decisions, facts, and tool call results.
- Replace the oldest turns with the summary.
- Keep the most recent N turns verbatim (default: last 5 turns).
The compression prompt must be instructed to preserve: all factual claims that will be needed downstream; all decisions made with rationale; all constraints established; any flags or warnings raised. The compression summary is stored in conversationHistorySummary; verbatim recent turns in conversationHistoryVerbatim.
6.5 Latency SLO
Handoff completion latency (time from sender dispatching the message to receiver accepting) must meet the following SLO:
| Transport Mode | Target P99 Latency | Maximum Acceptable |
|---|---|---|
| Synchronous HTTP | 200ms | 500ms |
| Async durable queue (Redis, SQS) | 500ms | 2000ms |
| Temporal workflow signal | 1000ms | 5000ms |
If the handoff latency SLO is breached, the task monitoring system alerts. Persistent SLO breaches on a specific agent pair indicate a bottleneck in that agent's intake processing.
7. Implementation Guide
7.1 Step-by-Step
Step 1 — Define your canonical schema version. Choose a schema version (e.g., 2.0) and publish it as a JSON Schema document in your internal schema registry. All agents in the system must target this version. Version bumps follow semantic versioning with backward-compatibility guarantees for minor versions.
Step 2 — Build the handoff builder library. Implement a shared library function buildHandoffMessage(agentState, targetAgentType, constraints) that agents call at the end of their execution. This ensures consistency across all agent types without requiring each team to re-implement the schema.
Step 3 — Implement context compression. Build the compression step as a separate utility with a configurable threshold. Test it with conversation histories of 5K, 10K, 20K, and 50K tokens and verify that key facts are preserved in the summary.
Step 4 — Implement receiver validation. Build validateHandoffMessage(message) as a shared library function returning a typed result: { valid: true } | { valid: false, reason: SchemaInvalid | IncompleteContext | SafetyViolation | BudgetExhausted }. All receiving agents call this before any other processing.
Step 5 — Implement the audit log write. The audit log write must occur before the receiving agent begins execution — not after. Use a database transaction or a two-phase commit if the transport and audit log are separate systems. An audit log write after execution failure leaves a gap in the trace.
Step 6 — Implement rollback. Ensure every sending agent checkpoints its state before dispatching a handoff message. The checkpoint write and the handoff message dispatch must be in the same transaction (or handled via an outbox pattern) to prevent a gap between saved state and sent message.
7.2 Code Skeleton (TypeScript)
interface HandoffValidationResult {
valid: boolean;
reason?: "SCHEMA_INVALID" | "INCOMPLETE_CONTEXT" | "SAFETY_VIOLATION" | "BUDGET_EXHAUSTED";
details?: string;
}
function validateHandoffMessage(message: HandoffMessage): HandoffValidationResult {
// Schema validation
const schemaResult = jsonSchema.validate(message, HANDOFF_SCHEMA_V2);
if (!schemaResult.valid) {
return { valid: false, reason: "SCHEMA_INVALID", details: schemaResult.errors.join("; ") };
}
// Completeness check
if (message.completedSubtasks.length === 0 && message.taskDescription !== "INITIAL") {
return { valid: false, reason: "INCOMPLETE_CONTEXT", details: "No completed subtasks in non-initial handoff" };
}
// Budget check
if (message.costTracking.costBudgetRemainingUSD <= 0) {
return { valid: false, reason: "BUDGET_EXHAUSTED", details: "No remaining budget" };
}
// Safety scan
const textToScan = [
message.conversationHistorySummary,
JSON.stringify(message.currentState),
...message.conversationHistoryVerbatim.map(t => t.content)
].join("\n");
if (safetyClassifier.detect(textToScan, ["prompt_injection", "jailbreak"])) {
return { valid: false, reason: "SAFETY_VIOLATION", details: "Injection pattern detected in context" };
}
return { valid: true };
}
async function acceptHandoff(message: HandoffMessage): Promise<"ACCEPTED" | "REJECTED"> {
const validation = validateHandoffMessage(message);
if (!validation.valid) {
await auditLog.append({ handoffId: message.handoffId, status: "REJECTED", reason: validation.reason });
if (validation.reason === "SAFETY_VIOLATION") {
await deadLetterQueue.send(message);
}
return "REJECTED";
}
// Write audit log BEFORE execution begins
await auditLog.append({
handoffId: message.handoffId,
taskId: message.taskId,
fromAgent: message.fromAgent.agentId,
toAgent: message.toAgent.agentType,
status: "ACCEPTED",
costAtHandoff: message.costTracking.costSpentSoFarUSD,
timestamp: new Date().toISOString()
});
return "ACCEPTED";
}
8. Observability
8.1 Handoff Metrics
| Metric | Description | Alert Threshold |
|---|---|---|
| Handoff validation failure rate | % of handoffs rejected at receiver | > 2% |
| Handoff p99 latency by transport | P99 delivery time per transport type | Above SLO per transport (see Section 6.5) |
| Safety violation rate | % of handoffs flagged by safety scan | > 0.1% (investigate immediately) |
| Context compression rate | % of handoffs that required compression | Baseline; spike indicates context growth |
| Dead-letter queue depth | Unprocessable handoffs awaiting human review | > 0 |
| Rollback event rate | % of handoffs that triggered rollback | > 1% |
8.2 Trace Reconstruction
Given a taskId, the full execution trace must be reconstructable from the handoff audit log in under 10 seconds. Verify this capability quarterly. The reconstruction query: SELECT * FROM handoff_audit WHERE taskId = :taskId ORDER BY timestamp ASC. The sequence of records represents the full chain of agent executions.
9. Cost Governance
- Budget relay. The
costBudgetRemainingUSDfield must be updated on every handoff. Senders deduct their actual cost before writing the field; receivers check it before executing. This relay mechanism ensures the budget ceiling is respected across the full agent chain. - Compression cost. Context compression is itself an LLM call with a cost. Target compression call cost at under 5% of the upstream agent call cost. Use an efficient model (not a frontier model) for compression.
- Audit log storage. Full handoff records including conversation history summaries can be large. Apply a compression codec (gzip, zstd) on audit log records. Implement a tiered storage policy: hot storage (Postgres) for 30 days; cold storage (S3) for the retention period.
10. Security Considerations
10.1 Prompt Injection via Handoff Context
The most dangerous vector is a malicious payload in the currentState or conversationHistoryVerbatim fields that contains instructions designed to hijack the receiving agent's behaviour. Mitigations:
- Safety scan all text fields at receiver validation (see Section 6.1).
- Structure the receiving agent's prompt so that handoff context appears in a clearly delimited section labelled
[CONTEXT FROM PREVIOUS AGENT — DATA ONLY], not as continuation of the system prompt. - Never allow handoff context to appear in the system prompt directly. Always in the user turn with explicit demarcation.
10.2 Handoff Message Tampering
An attacker with access to the message queue could modify a handoff message in transit (e.g., changing the remainingSubtasks to skip a safety step). Mitigation: include an HMAC signature in the message ("signature": "hmac-sha256:...") generated by the sender using a shared secret. The receiver validates the signature before schema validation.
10.3 Audit Log Integrity
Handoff audit records are the primary evidence for regulatory investigations. Implement a hash-chain pattern: each record includes previousRecordHash. The chain can be validated by recomputing hashes in sequence. Use a separate audit log service (not the same database as the operational store) so that operational access credentials cannot be used to tamper with the audit trail.
11. Failure Modes and Mitigations
| Failure Mode | Detection | Mitigation |
|---|---|---|
| Sender crashes before checkpointing | Handoff message in queue but no checkpoint in rollback store | Write checkpoint before enqueuing message (outbox pattern) |
| Receiver rejects valid message due to schema version mismatch | High rejection rate with SCHEMA_INVALID reason | Maintain backward compatibility for at least 2 minor schema versions |
| Context compression loses critical facts | Receiving agent produces incorrect output citing missing context | Test compression against golden set; increase verbatim retention window |
| Safety scan false positive blocks legitimate handoff | Rejection rate spike with SAFETY_VIOLATION for clean content | Tune safety classifier; implement appeal path via human review |
| Budget relay miscalculation causes premature halt | Task fails with BUDGET_EXHAUSTED before completing | Audit all agents for correct cost deduction logic; add integration test for budget relay |
| Audit log write fails | Handoff proceeds without log record — gap in trace | Make audit log write synchronous with handoff acceptance; fail the handoff if log write fails |
12. Compliance and Governance
12.1 Traceability for Regulated Decisions
For any AI system subject to regulatory oversight, the handoff audit trail provides the artefact needed to demonstrate: what data each agent received; what each agent concluded; in what order agents operated; what the total cost was; and what the final output was. This trace must be producible within 72 hours of a regulatory request. Test this capability quarterly.
12.2 Data Residency
Handoff messages may carry personal data (e.g., customer information in contract review). Ensure that the durable queue and audit log are hosted in the same regulatory jurisdiction as the data being processed. Do not route handoff messages through infrastructure in jurisdictions with inadequate data protection laws without explicit transfer mechanisms under GDPR Chapter V.
12.3 GDPR Right to Erasure
If a data subject exercises their right to erasure, handoff audit records containing their personal data must be identified and scrubbed. Design the audit log schema so that personal data is stored in a separate personalDataPayload field with a dedicated data-subject index, enabling targeted erasure without destroying the structural audit trail.
13. Testing Strategy
13.1 Unit Tests
- Schema validation: for each required field, assert that a message missing that field fails validation with
SCHEMA_INVALID. - Completeness check: assert that a handoff with zero completed subtasks (and non-INITIAL task description) fails with
INCOMPLETE_CONTEXT. - Budget check: assert that a handoff with
costBudgetRemainingUSD: 0fails withBUDGET_EXHAUSTED. - Safety scan: assert that a handoff message containing a known injection pattern fails with
SAFETY_VIOLATION. - HMAC signature: assert that a message with a tampered field fails signature verification.
13.2 Integration Tests
- Full handoff flow: agent A completes a subtask, builds a handoff message, sends it to agent B, agent B validates and accepts, audit log is written. Assert audit log contains the correct
fromAgent,toAgent, andtaskId. - Rollback: agent B rejects the handoff. Assert agent A's checkpoint state is restored and task status reverts to
PENDING_HANDOFF. - Context compression: build a handoff with 50K tokens of conversation history. Assert the compression step fires, the output message is under the context window threshold, and the summary is non-empty.
- Budget relay: agent A deducts its cost from the budget; assert the handoff message's
costBudgetRemainingUSDreflects the deduction.
13.3 End-to-End Tests
- Run a full 4-agent pipeline with the handoff protocol wired between all agents. Assert the handoff audit log contains 3 records (one per handoff boundary). Assert the
taskIdis consistent across all records. Assert the total cost (sum ofcostTracking.costSpentSoFarUSDacross all handoffs) matches the expected budget spend.
14. Variants and Extensions
14.1 Streaming Handoff
For agents that produce large outputs incrementally, implement a streaming handoff where the sender emits partial result chunks to the receiver as they are generated, rather than batching everything into a single message. The receiver begins processing as soon as a minimum viable context is available. Requires a streaming-aware validation step that validates each chunk schema and a final completeness check at the end of the stream.
14.2 Multi-Receiver Handoff (Fan-Out)
For the parallel fan-out topology (EAAPL-MAG001), a single handoff message is broadcast to multiple receivers. Each receiver gets a complete copy of the message. The sender records a receiverList field listing all intended recipients. The aggregation step validates that all receivers acknowledged the handoff before proceeding.
14.3 Schema Evolution Strategy
As agent capabilities evolve, the handoff schema will need to change. Follow these rules: new optional fields (minor version bump, fully backward-compatible); new required fields (major version bump, migration required); field removals or type changes (major version bump, deprecation period of at least one release cycle). Maintain the schema in a versioned registry accessible to all agents.
15. Trade-off Analysis
| Dimension | Formal Handoff Protocol | Ad-Hoc Context Transfer |
|---|---|---|
| Traceability | Full (every boundary logged) | None |
| Prompt injection resistance | High (safety scan at receiver) | None |
| Rollback support | Full (checkpoint before handoff) | None |
| Implementation overhead | Moderate (shared library reduces per-team cost) | Low initially, high at scale |
| Schema evolution complexity | Moderate (versioning required) | Low |
| Debugging multi-agent failures | Easy (audit trail) | Very hard |
16. Known Implementations
| Organisation Type | Use Case | Agent Chain Length | Reported Outcome |
|---|---|---|---|
| Global bank | Loan origination pipeline | 6 agents | Full regulatory audit trace producible in < 30s; zero failed handoffs in 6 months |
| Insurance carrier | Claims processing automation | 4 agents | 99.8% handoff acceptance rate; SAFETY_VIOLATION alerts caught 3 injection attempts |
| Healthcare system | Clinical summarisation pipeline | 5 agents | GDPR erasure requests resolved in < 4 hours using personal data index |
| Legal tech platform | Contract review pipeline | 8 agents | Handoff audit trail used as primary evidence in MRM review; passed first assessment |
17. Related Patterns
| Pattern ID | Name | Relationship |
|---|---|---|
| EAAPL-MAG001 | Multi-Agent Orchestration | Handoff protocol is the inter-agent message standard for all orchestrated agents |
| EAAPL-MAG002 | Supervisor Agent | Supervisor-to-worker and worker-to-supervisor messages both use this protocol |
| EAAPL-MAG003 | Human-in-the-Loop Agent | Checkpoint state serialisation uses the handoff schema |
| EAAPL-MAG004 | Agent Swarm | Blackboard task record schema is derived from the handoff schema |
18. References
- Gartner, "Designing Robust Inter-Agent Communication for Enterprise AI," 2025 (ID: G00822341)
- W3C Trace Context Specification — w3.org/TR/trace-context
- JSON Schema Specification, Draft 2020-12 — json-schema.org
- Anthropic, "Building Effective Agents," 2025 — anthropic.com/research/building-effective-agents
- OWASP LLM Top 10: LLM01 Prompt Injection — owasp.org/www-project-top-10-for-large-language-model-applications
- EU AI Act (Regulation 2024/1689), Article 12: Record-Keeping for High-Risk AI Systems
- GDPR (Regulation 2016/679), Article 17: Right to Erasure
- Martin Fowler, "Transactional Outbox Pattern" — martinfowler.com/articles/patterns-of-distributed-systems/outbox.html
- Microsoft Azure, "Retry Pattern" — learn.microsoft.com/azure/architecture/patterns/retry
- NIST AI RMF 1.0, Map 2.3: AI System Data Provenance and Traceability