EAAPL-MAG005 — Debate Agent
Status: Emerging
Tags: agent llm model-risk high-complexity
Version: 2.0.0
Last Updated: 2026-06-12
1. Pattern Identity
| Field | Value |
|---|---|
| Pattern ID | EAAPL-MAG005 |
| Name | Debate Agent |
| Category | Multi-Agent |
| Maturity | Emerging |
| Complexity | High |
| Related Patterns | EAAPL-MAG001 · EAAPL-MAG002 · EAAPL-MAG003 · EAAPL-MAG006 |
2. Executive Summary
The Debate Agent pattern structures multiple AI agents into a formal argumentation process — proposer, critic, rebuttal, and moderator — to improve the quality and reliability of AI outputs on high-stakes, contested questions. Rather than relying on a single agent's reasoning, debate forces the system to surface and evaluate opposing considerations before synthesis. The moderator agent evaluates argument quality (not positions) and synthesises a final output with explicit uncertainty flags where arguments were not resolved. The pattern is empirically validated as improving output quality for tasks where a single LLM tends toward overconfident or sycophantically skewed answers: contract risk review, security threat modelling, architectural decision records, and investment thesis evaluation. The principal cost is 4 to 8 times the token spend of a single-agent approach, meaning cost governance is a first-class concern. Debate should be used selectively for decisions with genuine asymmetric downside — not as a default reasoning mode.
3. Problem Statement
3.1 Context
Large language models trained with reinforcement learning from human feedback (RLHF) are prone to confident, coherent-sounding outputs that are subtly wrong. They tend to agree with the framing presented in the prompt (sycophancy), over-commit to the first plausible hypothesis (confirmation bias), and under-represent low-probability but high-consequence risks. For decisions with large downside asymmetry — a contractual clause that will cost millions if missed, a security vulnerability that will be exploited if not identified — this systematic bias toward confident, friendly responses is dangerous.
3.2 Forces in Tension
- Output quality vs. cost. Debate produces measurably better outputs for high-stakes questions but at 4–8× the token cost of a single-agent call.
- Genuine challenge vs. sycophantic critic. A critic agent that is prompted naively will produce surface-level criticism while ultimately agreeing with the proposer — the most common failure mode of the debate pattern. The critic must be explicitly and forcefully prompted to disagree.
- Debate depth vs. convergence. More debate rounds improve quality but increase cost and latency. The optimal number of rounds is task-dependent.
- Moderator neutrality vs. bias. The moderator must evaluate argument quality objectively. If the moderator's training biases it toward one position, the debate outcome is predetermined.
3.3 Failure Modes Without This Pattern
Without debate, single-agent outputs for high-stakes questions systematically under-represent dissenting considerations and over-represent the most plausible (not necessarily correct) hypothesis. Risk assessments miss tail risks. Security analyses miss non-obvious attack vectors. Architectural reviews miss long-term maintenance costs. These omissions are not random — they are systematic biases of the underlying models.
4. Solution
4.1 Debate Architecture
4.2 Moderator Evaluation Flow
5. Structure
5.1 Component Catalogue
| Component | Responsibility | Technology Options |
|---|---|---|
| Framing Agent | Structures the question into a clear proposition for debate | LLM with framing prompt |
| Proposer Agent | Constructs the affirmative case with supporting evidence | LLM with proposer system prompt |
| Critic Agent | Constructs the most forceful opposing case — must genuinely challenge | LLM with adversarial critic prompt |
| Rebuttal Agent | Responds to critic arguments on behalf of the proposition | LLM with rebuttal prompt |
| Moderator Agent | Evaluates argument quality; synthesises; flags unresolved points | LLM with moderator prompt, ideally different model from proposer/critic |
| Debate State Store | Persists all rounds for audit and cost tracking | Postgres, Redis |
| Cost Monitor | Tracks token spend per debate run; enforces cost ceiling | Middleware on LLM client |
5.2 Debate Round Record Schema
{
"debateId": "uuid-v4",
"roundNumber": 1,
"question": "...",
"proposerPosition": {
"thesis": "...",
"supporting_arguments": [],
"evidence_cited": []
},
"criticPosition": {
"thesis": "...",
"counter_arguments": [],
"evidence_cited": [],
"challengeStrength": 7
},
"rebuttalPosition": {
"thesis": "...",
"rebuttals": [],
"concessions": []
},
"moderatorEvaluation": {
"argumentQualityScore": { "proposer": 8, "critic": 7, "rebuttal": 6 },
"resolvedPoints": [],
"unresolvedPoints": [],
"recommendAnotherRound": false,
"synthesis": "...",
"confidenceLevel": "HIGH | MODERATE | LOW",
"explicitUncertainties": []
},
"tokenCost": { "proposer": 1200, "critic": 1100, "rebuttal": 900, "moderator": 800 }
}
6. Behaviour
6.1 Debate Structure
Round 1 — Proposition. The proposer agent receives the question and relevant context. Its system prompt instructs it to: state a clear thesis; enumerate supporting arguments in priority order; cite specific evidence from the context; acknowledge the strongest likely objections pre-emptively.
Round 1 — Critique. The critic agent receives the question, context, and the proposer's position. The critic's system prompt is the critical design element — see Section 6.2.
Round 1 — Rebuttal. The rebuttal agent receives the proposer's position and the critic's arguments. It must: address each critic argument specifically (not generically); concede points where the critic's argument is stronger; provide new evidence or reasoning for contested points.
Moderator Evaluation. The moderator receives all three positions and evaluates: argument quality (independent of which side made the argument); which points were resolved; which remain genuinely contested; whether another debate round would improve resolution. The moderator synthesises a final output that reflects the weight of evidence, explicitly flags unresolved points, and assigns a confidence level.
Configurable rounds. The default is 1 round (proposition + critique + rebuttal + moderation). For high-stakes decisions, 2 rounds add a second cycle of critique and rebuttal before moderation. Enforce a hard maximum of 3 rounds to prevent cost runaway.
6.2 Critic Design — The Most Important Prompt in the System
The lazy critic is the most common failure mode of the debate pattern. A naively prompted critic will identify minor issues, acknowledge the proposer's main points as "reasonable," and ultimately agree with the proposition — providing the appearance of scrutiny without its substance. This is worse than no debate because it produces false confidence.
The critic system prompt must:
- Explicitly forbid the critic from agreeing with the proposition's framing. The critic's job is to find the strongest possible case against it.
- Require the critic to assume the proposition is wrong and work backward from that assumption to find supporting arguments.
- Instruct the critic to prioritise tail risks and low-probability high-consequence objections that the proposer is likely to have underweighted.
- Forbid softening language ("while I acknowledge," "this is generally reasonable but..."). Every argument must be stated as a direct challenge.
- Require a
challengeStrengthself-assessment score from 1–10. Scores below 6 trigger an automatic retry with a stronger adversarial instruction.
Example critic system prompt excerpt:
You are an adversarial critic. Your ONLY job is to find the strongest possible case
against the proposition. You must assume the proposition is WRONG.
Rules:
1. Never say "while I acknowledge" or soften your arguments.
2. State every objection as a direct, affirmative claim.
3. Focus especially on: tail risks the proposer ignores; evidence the proposer
misinterprets; hidden assumptions in the proposer's argument; scenarios where
the proposition causes harm even if its thesis is correct.
4. At the end, rate your challenge strength from 1-10. If you rate yourself below 6,
your critique was too weak — rewrite it to be stronger.
6.3 Moderator Design — Evaluating Arguments Not Positions
The moderator's system prompt must explicitly forbid it from having a preferred outcome. It evaluates:
- Logical validity. Does the argument follow from its premises?
- Evidence quality. Is cited evidence from the context, or fabricated? Is it interpreted correctly?
- Completeness. Did the argument address all relevant aspects of the question?
- Internal consistency. Does the argument contradict itself?
The moderator must NOT evaluate which position "sounds more reasonable" or "aligns with best practices" — these are position judgments, not argument quality judgments.
6.4 Use Cases Where Debate Beats Single Agent
The debate pattern produces measurable quality improvement in the following categories:
- Contract risk review. Debating whether a specific clause creates liability. Single agents miss tail-risk interpretations that aggressive counsel would find.
- Security threat modelling. Debating whether a proposed architecture is secure. Single agents tend to accept reasonable-sounding security controls without stress-testing them.
- Architecture decision records (ADR). Debating whether a technology choice is appropriate for the context. Single agents underweight long-term maintenance costs and ecosystem risk.
- Investment thesis evaluation. Debating whether a market assumption is valid. Single agents exhibit confirmation bias toward the thesis presented.
- Regulatory compliance assessment. Debating whether a business practice complies with a regulation. Single agents tend toward optimistic interpretations.
6.5 Cost Governance — When to Use Debate
Given the 4–8× cost multiplier, debate should be triggered only when:
- The decision has asymmetric downside (a wrong answer costs significantly more than the debate overhead).
- A single-agent answer is available but carries observable uncertainty indicators (hedging language, low confidence score).
- The decision type is in the known high-risk category (contract, security, architecture, investment, compliance).
- The question is genuinely contested — not a factual lookup where debate adds no value.
Implement a gating function: shouldDebate(taskType, confidenceScore, estimatedDownside) → boolean. Default: debate when estimatedDownside > debateCostThreshold × 50.
7. Implementation Guide
7.1 Step-by-Step
Step 1 — Build the framing agent. The framing step converts a raw user question into a structured proposition that is suitable for debate. Output: a clear thesis statement, the scope of the question, the criteria for a good answer, and the key evidence documents.
Step 2 — Build the proposer agent. Straightforward: instruct the model to make the best case for the proposition using the provided evidence. Include an instruction to pre-emptively address the strongest objections — this forces the proposer to acknowledge weaknesses, making the overall debate more productive.
Step 3 — Build the adversarial critic agent. This is the highest-priority prompt engineering task. Follow the guidelines in Section 6.2. Run the critic against 20 known test cases before production deployment and manually verify that challenge strength scores average above 7.
Step 4 — Build the rebuttal agent. Instruct the rebuttal agent to address each critic argument by number, explicitly concede points where appropriate, and provide specific counter-evidence.
Step 5 — Build the moderator agent. Use a different model than the proposer and critic if possible — different model families have different biases, and a moderator from the same family may exhibit the same biases as the agents it is evaluating. The moderator prompt must include: evaluation rubric, instruction to flag unresolved points explicitly, and instruction to assign a confidence level to the synthesis.
Step 6 — Wire cost governance. Track token spend per round. If cumulative cost reaches the cost ceiling mid-debate, terminate and return the current state of moderation with a DEBATE_TRUNCATED flag.
7.2 Code Skeleton (TypeScript)
interface DebateConfig {
maxRounds: number;
costCeilingUSD: number;
minChallengeStrength: number;
requireCriticRetryBelow: number;
}
async function runDebate(
question: string,
context: string,
config: DebateConfig = { maxRounds: 2, costCeilingUSD: 0.50, minChallengeStrength: 6, requireCriticRetryBelow: 6 }
): Promise<DebateResult> {
const debateId = crypto.randomUUID();
let totalCost = 0;
const rounds: DebateRound[] = [];
const proposition = await framingAgent.invoke({ question, context });
for (let round = 1; round <= config.maxRounds; round++) {
if (totalCost >= config.costCeilingUSD) {
return buildResult(rounds, "TRUNCATED_BUDGET");
}
const proposerOutput = await proposerAgent.invoke({ question, context, proposition, previousRounds: rounds });
totalCost += proposerOutput.cost;
let criticOutput = await criticAgent.invoke({ question, context, proposerOutput, previousRounds: rounds });
totalCost += criticOutput.cost;
// Retry critic if challenge strength is too low
if (criticOutput.challengeStrength < config.requireCriticRetryBelow) {
criticOutput = await criticAgent.invoke({
question, context, proposerOutput, previousRounds: rounds,
additionalInstruction: `Your previous critique was rated ${criticOutput.challengeStrength}/10.
Rewrite with a stronger adversarial stance. Assume the proposition is dangerously wrong.`
});
totalCost += criticOutput.cost;
}
const rebuttalOutput = await rebuttalAgent.invoke({ question, context, proposerOutput, criticOutput });
totalCost += rebuttalOutput.cost;
const moderatorOutput = await moderatorAgent.invoke({ question, proposerOutput, criticOutput, rebuttalOutput });
totalCost += moderatorOutput.cost;
rounds.push({ round, proposerOutput, criticOutput, rebuttalOutput, moderatorOutput, cost: totalCost });
if (!moderatorOutput.recommendAnotherRound) break;
}
return buildResult(rounds, "COMPLETE");
}
8. Observability
8.1 Debate Quality Metrics
| Metric | Description | Alert Threshold |
|---|---|---|
| Critic challenge strength average | Mean self-reported challenge strength across all debates | < 6.5 (critic is too soft) |
| Moderator confidence distribution | % of debates returning LOW confidence synthesis | > 30% (questions are systematically too ambiguous) |
| Explicit uncertainties per debate | Average unresolved points flagged by moderator | Baseline by task type; deviation is signal |
| Debate cost per task type | Actual token cost vs. estimate | > 1.5× estimate (prompt runaway) |
| Single-agent agreement rate | % of debates where debate conclusion agrees with the first proposer output | Used for cost-value calibration |
8.2 Debate Audit Trail
Every debate run must produce a complete, immutable record: question, framing, all proposer/critic/rebuttal/moderator outputs for each round, token costs, final synthesis, confidence level, and explicit uncertainties. This record is the Model Risk Management (MRM) documentation for the decision.
9. Cost Governance
- Selective trigger. Never use debate as the default reasoning mode. Gate on task type and estimated decision value.
- Round ceiling. Hard maximum of 3 rounds enforced in the debate runner. The marginal quality improvement of round 4+ rarely justifies the cost.
- Model tiering for non-critical roles. Use efficient models for the rebuttal agent (less complex reasoning) and a faster model for moderation. Reserve frontier models for the proposer and critic where argument quality matters most.
- Cost ceiling with graceful truncation. If the debate cost ceiling is reached mid-debate, return the current moderation state with a
DEBATE_TRUNCATEDflag rather than failing silently. - Empirical calibration. Track the correlation between debate use and downstream decision quality (measured by human expert review or outcome). Update the cost-value trigger threshold quarterly.
10. Security Considerations
10.1 Critic Agent Misuse
A deliberately weak critic system prompt could be used to manufacture the appearance of debate scrutiny without providing it — generating a false audit trail that claims rigorous review occurred. Detect by monitoring critic challenge strength scores. Any debate where the critic self-rates below 5 should be flagged for prompt review.
10.2 Moderator Bias Injection
If the moderator's system prompt is modified by an attacker to favour one position, the debate outcome is compromised regardless of argument quality. Protect the moderator prompt as a sensitive configuration artefact — hash it at deployment and verify the hash at runtime.
10.3 Evidence Fabrication
Proposer and critic agents may cite evidence that does not exist in the provided context (hallucination). The moderator's evaluation must include a citation verification step: for each cited piece of evidence, verify it appears in the source context. Flag any uncited or fabricated evidence as a critical validation failure.
11. Failure Modes and Mitigations
| Failure Mode | Detection | Mitigation |
|---|---|---|
| Lazy critic produces surface-level critique | Challenge strength score below 6 | Automatic retry with stronger adversarial instruction |
| Moderator exhibits positional bias | Audit of moderator outputs over time shows systematic tilt | Use a different model family for moderator; add neutrality checks |
| Debate oscillates without resolution | Moderator recommends additional rounds repeatedly | Hard round ceiling; return with explicit unresolved points after max rounds |
| Cost runaway from round explosion | Cost monitoring alert | Hard cost ceiling with DEBATE_TRUNCATED graceful return |
| Evidence hallucination by proposer or critic | Moderator citation verification fails | Flag fabricated citations; discard the argument that relied on them |
| Single model bias across all roles | All agents reach same conclusion regardless of role | Use different model providers or model versions for proposer vs critic |
12. Compliance and Governance
12.1 Model Risk Management (SR 11-7)
For financial services applications, the debate pattern's audit trail constitutes the formal model validation documentation. The record of all proposer, critic, and moderator outputs demonstrates that the model's output was subjected to structured challenge before being used in a decision. Document the debate parameters (rounds, models used, prompt versions) as part of the model's technical documentation.
12.2 EU AI Act — Robustness (Article 15)
Article 15 requires that high-risk AI systems achieve appropriate levels of accuracy, robustness, and cybersecurity. The debate pattern directly supports the robustness requirement by systematically challenging single-model outputs. Maintain empirical evidence of quality improvement from debate vs. single-agent baselines for each supported task type.
13. Testing Strategy
13.1 Unit Tests
- Critic retry logic: mock a critic that returns challenge strength 4; assert the runner triggers a retry with a stronger adversarial instruction.
- Cost ceiling: mock agents with known token costs; assert the runner terminates when the ceiling is reached and returns
DEBATE_TRUNCATED. - Round ceiling: configure
maxRounds: 2and mock a moderator that always recommends another round; assert debate terminates after 2 rounds.
13.2 Integration Tests
- Full debate run on a known test question with canned proposer/critic/rebuttal/moderator responses; assert the output schema matches the debate result structure.
- Critic citation fabrication: inject a critic output that cites a source not in the context; assert the moderator flags the fabricated citation.
13.3 Quality Evaluation Tests (Human-Labelled Baseline)
For each supported task type, maintain a golden set of 20 questions with expert-labelled "correct" answers. Run the debate system against this set quarterly. Assert that debate outputs agree with expert labels at a higher rate than single-agent outputs. This is the empirical evidence that debate adds value.
13.4 End-to-End Tests
- Submit a contract clause for debate review; assert all four agent outputs are present in the result; assert the moderator output includes explicit confidence level and at least one uncertainty flag; assert total cost is within expected bounds.
14. Variants and Extensions
14.1 Multi-Position Debate
Rather than binary proposition/critique, enumerate three or more positions (e.g., "Adopt technology X," "Adopt technology Y," "Defer the decision") and have a separate proposer agent for each. The moderator evaluates all positions and synthesises a ranked recommendation. Higher cost but more complete for genuinely multi-option decisions.
14.2 Expert Debate (Persona-Prompted)
Assign each agent a specific expert persona (e.g., "You are a senior litigation partner at a global law firm," "You are a chief risk officer at a tier-1 bank"). Expert personas improve the domain-specificity of arguments but require careful persona design to avoid stereotyping.
14.3 Sequential Evidence Introduction
Rather than providing all context upfront, introduce new evidence at each debate round to simulate the information-gathering that occurs in real decisions. Tests how each agent updates its position in response to new information.
15. Trade-off Analysis
| Dimension | Debate Agent | Single Agent | Supervisor with Validation |
|---|---|---|---|
| Output quality on contested questions | Highest | Lowest | Moderate |
| Token cost | 4–8× single agent | 1× | 2–3× |
| Latency | 3–5× single agent | 1× | 2× |
| Auditability of reasoning | Highest (full debate record) | Low | Moderate |
| Sycophancy resistance | Highest | Lowest | Moderate |
| Use case suitability | High-stakes, contested decisions | Low-stakes, factual | Standard validation use cases |
16. Known Implementations
| Organisation Type | Use Case | Rounds Used | Reported Quality Improvement |
|---|---|---|---|
| Global law firm | Contract liability clause risk review | 2 rounds | 34% more risk flags identified vs single-agent; 89% expert agreement rate |
| Investment bank | Investment thesis challenge | 1 round | 28% increase in identified downside risks; MRM audit passed first review |
| Software company | Architecture Decision Record review | 1 round | 41% of debate-identified concerns confirmed by post-implementation retrospective |
| Cybersecurity consultancy | Threat modelling | 2 rounds | 52% more attack vectors identified vs single-agent threat model |
17. Related Patterns
| Pattern ID | Name | Relationship |
|---|---|---|
| EAAPL-MAG001 | Multi-Agent Orchestration | Debate can be embedded as a specialist agent type within an orchestration |
| EAAPL-MAG002 | Supervisor Agent | Supervisor can invoke debate between two workers on a contested subtask |
| EAAPL-MAG003 | Human-in-the-Loop Agent | Final human review checkpoint after debate synthesis for regulated use cases |
| EAAPL-MAG006 | Agent Handoff Protocol | Debate round records use handoff schema for inter-agent message passing |
18. References
- Liang et al., "Encouraging Divergent Thinking in Large Language Models through Debate," 2023 — arxiv.org/abs/2305.19118
- Du, Y. et al., "Improving Factuality and Reasoning in LLMs through Multiagent Debate," 2023 — arxiv.org/abs/2305.14325
- Irving, G. et al., "AI Safety via Debate," OpenAI, 2018 — arxiv.org/abs/1805.00899
- Gartner, "Using Multi-Agent AI for High-Stakes Decision Review," 2025 (ID: G00823891)
- Anthropic, "Sycophancy in AI Systems," 2024 — anthropic.com/research/sycophancy
- SR 11-7: Guidance on Model Risk Management — federalreserve.gov/supervisionreg/srletters/sr1107.htm
- EU AI Act (Regulation 2024/1689), Article 15: Accuracy, Robustness, and Cybersecurity
- Chan, C. et al., "ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate," 2023 — arxiv.org/abs/2308.07201
- Madaan, A. et al., "Self-Refine: Iterative Refinement with Self-Feedback," 2023 — arxiv.org/abs/2303.17651
- NIST AI RMF 1.0, Measure 2.5: Bias, Fairness, and Explainability Testing