Proven

EAAPL-MAG005 — Debate Agent

Status: Emerging Tags: agent llm model-risk high-complexity Version: 2.0.0 Last Updated: 2026-06-12

1. Pattern Identity

Field	Value
Pattern ID	EAAPL-MAG005
Name	Debate Agent
Category	Multi-Agent
Maturity	Emerging
Complexity	High
Related Patterns	EAAPL-MAG001 · EAAPL-MAG002 · EAAPL-MAG003 · EAAPL-MAG006

2. Executive Summary

The Debate Agent pattern structures multiple AI agents into a formal argumentation process — proposer, critic, rebuttal, and moderator — to improve the quality and reliability of AI outputs on high-stakes, contested questions. Rather than relying on a single agent's reasoning, debate forces the system to surface and evaluate opposing considerations before synthesis. The moderator agent evaluates argument quality (not positions) and synthesises a final output with explicit uncertainty flags where arguments were not resolved. The pattern is empirically validated as improving output quality for tasks where a single LLM tends toward overconfident or sycophantically skewed answers: contract risk review, security threat modelling, architectural decision records, and investment thesis evaluation. The principal cost is 4 to 8 times the token spend of a single-agent approach, meaning cost governance is a first-class concern. Debate should be used selectively for decisions with genuine asymmetric downside — not as a default reasoning mode.

3. Problem Statement

3.1 Context

Large language models trained with reinforcement learning from human feedback (RLHF) are prone to confident, coherent-sounding outputs that are subtly wrong. They tend to agree with the framing presented in the prompt (sycophancy), over-commit to the first plausible hypothesis (confirmation bias), and under-represent low-probability but high-consequence risks. For decisions with large downside asymmetry — a contractual clause that will cost millions if missed, a security vulnerability that will be exploited if not identified — this systematic bias toward confident, friendly responses is dangerous.

3.2 Forces in Tension

Output quality vs. cost. Debate produces measurably better outputs for high-stakes questions but at 4–8× the token cost of a single-agent call.
Genuine challenge vs. sycophantic critic. A critic agent that is prompted naively will produce surface-level criticism while ultimately agreeing with the proposer — the most common failure mode of the debate pattern. The critic must be explicitly and forcefully prompted to disagree.
Debate depth vs. convergence. More debate rounds improve quality but increase cost and latency. The optimal number of rounds is task-dependent.
Moderator neutrality vs. bias. The moderator must evaluate argument quality objectively. If the moderator's training biases it toward one position, the debate outcome is predetermined.

3.3 Failure Modes Without This Pattern

Without debate, single-agent outputs for high-stakes questions systematically under-represent dissenting considerations and over-represent the most plausible (not necessarily correct) hypothesis. Risk assessments miss tail risks. Security analyses miss non-obvious attack vectors. Architectural reviews miss long-term maintenance costs. These omissions are not random — they are systematic biases of the underlying models.

4. Solution

4.1 Debate Architecture

ARCHITECTURE DIAGRAM

flowchart TD subgraph Setup["Question Setup"] A[Question Input] B[Framing Agent] end subgraph Debate["Debate Rounds"] C[Proposer Agent] D[Critic Agent] E[Rebuttal Agent] end subgraph Synthesis["Synthesis"] F[Moderator Agent] G{Resolved?} H[Final Output] I[Flagged Uncertainties] end A --> B --> C C --> D --> E --> F F --> G G -->|yes| H G -->|no - add round| C G -->|unresolved| I --> H style A fill:#dbeafe,stroke:#3b82f6 style B fill:#f0fdf4,stroke:#22c55e style C fill:#f0fdf4,stroke:#22c55e style D fill:#f0fdf4,stroke:#22c55e style E fill:#f0fdf4,stroke:#22c55e style F fill:#f0fdf4,stroke:#22c55e style G fill:#f3e8ff,stroke:#a855f7 style H fill:#d1fae5,stroke:#10b981 style I fill:#fee2e2,stroke:#ef4444

4.2 Moderator Evaluation Flow

ARCHITECTURE DIAGRAM

flowchart TD subgraph Inputs["Moderator Inputs"] A[Proposer Position] B[Critic Arguments] C[Rebuttal Arguments] end subgraph Evaluation["Moderator Evaluation"] D[Argument Quality Score] E[Agreement Points] F[Unresolved Points] G{More Rounds Needed?} end subgraph Output["Moderator Output"] H[Synthesis with Confidence] I[Explicit Uncertainty Flags] end A --> D B --> D C --> D D --> E D --> F F --> G G -->|yes and under round limit| J[Trigger Next Round] G -->|no| H F --> I --> H style A fill:#dbeafe,stroke:#3b82f6 style B fill:#dbeafe,stroke:#3b82f6 style C fill:#dbeafe,stroke:#3b82f6 style D fill:#f0fdf4,stroke:#22c55e style E fill:#f0fdf4,stroke:#22c55e style F fill:#fee2e2,stroke:#ef4444 style G fill:#f3e8ff,stroke:#a855f7 style H fill:#d1fae5,stroke:#10b981 style I fill:#fee2e2,stroke:#ef4444 style J fill:#f0fdf4,stroke:#22c55e

5. Structure

5.1 Component Catalogue

Component	Responsibility	Technology Options
Framing Agent	Structures the question into a clear proposition for debate	LLM with framing prompt
Proposer Agent	Constructs the affirmative case with supporting evidence	LLM with proposer system prompt
Critic Agent	Constructs the most forceful opposing case — must genuinely challenge	LLM with adversarial critic prompt
Rebuttal Agent	Responds to critic arguments on behalf of the proposition	LLM with rebuttal prompt
Moderator Agent	Evaluates argument quality; synthesises; flags unresolved points	LLM with moderator prompt, ideally different model from proposer/critic
Debate State Store	Persists all rounds for audit and cost tracking	Postgres, Redis
Cost Monitor	Tracks token spend per debate run; enforces cost ceiling	Middleware on LLM client

5.2 Debate Round Record Schema

{
  "debateId": "uuid-v4",
  "roundNumber": 1,
  "question": "...",
  "proposerPosition": {
    "thesis": "...",
    "supporting_arguments": [],
    "evidence_cited": []
  },
  "criticPosition": {
    "thesis": "...",
    "counter_arguments": [],
    "evidence_cited": [],
    "challengeStrength": 7
  },
  "rebuttalPosition": {
    "thesis": "...",
    "rebuttals": [],
    "concessions": []
  },
  "moderatorEvaluation": {
    "argumentQualityScore": { "proposer": 8, "critic": 7, "rebuttal": 6 },
    "resolvedPoints": [],
    "unresolvedPoints": [],
    "recommendAnotherRound": false,
    "synthesis": "...",
    "confidenceLevel": "HIGH | MODERATE | LOW",
    "explicitUncertainties": []
  },
  "tokenCost": { "proposer": 1200, "critic": 1100, "rebuttal": 900, "moderator": 800 }
}

6. Behaviour

6.1 Debate Structure

Round 1 — Proposition. The proposer agent receives the question and relevant context. Its system prompt instructs it to: state a clear thesis; enumerate supporting arguments in priority order; cite specific evidence from the context; acknowledge the strongest likely objections pre-emptively.

Round 1 — Critique. The critic agent receives the question, context, and the proposer's position. The critic's system prompt is the critical design element — see Section 6.2.

Round 1 — Rebuttal. The rebuttal agent receives the proposer's position and the critic's arguments. It must: address each critic argument specifically (not generically); concede points where the critic's argument is stronger; provide new evidence or reasoning for contested points.

Moderator Evaluation. The moderator receives all three positions and evaluates: argument quality (independent of which side made the argument); which points were resolved; which remain genuinely contested; whether another debate round would improve resolution. The moderator synthesises a final output that reflects the weight of evidence, explicitly flags unresolved points, and assigns a confidence level.

Configurable rounds. The default is 1 round (proposition + critique + rebuttal + moderation). For high-stakes decisions, 2 rounds add a second cycle of critique and rebuttal before moderation. Enforce a hard maximum of 3 rounds to prevent cost runaway.

6.2 Critic Design — The Most Important Prompt in the System

The lazy critic is the most common failure mode of the debate pattern. A naively prompted critic will identify minor issues, acknowledge the proposer's main points as "reasonable," and ultimately agree with the proposition — providing the appearance of scrutiny without its substance. This is worse than no debate because it produces false confidence.

The critic system prompt must:

Explicitly forbid the critic from agreeing with the proposition's framing. The critic's job is to find the strongest possible case against it.
Require the critic to assume the proposition is wrong and work backward from that assumption to find supporting arguments.
Instruct the critic to prioritise tail risks and low-probability high-consequence objections that the proposer is likely to have underweighted.
Forbid softening language ("while I acknowledge," "this is generally reasonable but..."). Every argument must be stated as a direct challenge.
Require a challengeStrength self-assessment score from 1–10. Scores below 6 trigger an automatic retry with a stronger adversarial instruction.

Example critic system prompt excerpt:

You are an adversarial critic. Your ONLY job is to find the strongest possible case
against the proposition. You must assume the proposition is WRONG.

Rules:
1. Never say "while I acknowledge" or soften your arguments.
2. State every objection as a direct, affirmative claim.
3. Focus especially on: tail risks the proposer ignores; evidence the proposer
   misinterprets; hidden assumptions in the proposer's argument; scenarios where
   the proposition causes harm even if its thesis is correct.
4. At the end, rate your challenge strength from 1-10. If you rate yourself below 6,
   your critique was too weak — rewrite it to be stronger.

6.3 Moderator Design — Evaluating Arguments Not Positions

The moderator's system prompt must explicitly forbid it from having a preferred outcome. It evaluates:

Logical validity. Does the argument follow from its premises?
Evidence quality. Is cited evidence from the context, or fabricated? Is it interpreted correctly?
Completeness. Did the argument address all relevant aspects of the question?
Internal consistency. Does the argument contradict itself?

The moderator must NOT evaluate which position "sounds more reasonable" or "aligns with best practices" — these are position judgments, not argument quality judgments.

6.4 Use Cases Where Debate Beats Single Agent

The debate pattern produces measurable quality improvement in the following categories:

Contract risk review. Debating whether a specific clause creates liability. Single agents miss tail-risk interpretations that aggressive counsel would find.
Security threat modelling. Debating whether a proposed architecture is secure. Single agents tend to accept reasonable-sounding security controls without stress-testing them.
Architecture decision records (ADR). Debating whether a technology choice is appropriate for the context. Single agents underweight long-term maintenance costs and ecosystem risk.
Investment thesis evaluation. Debating whether a market assumption is valid. Single agents exhibit confirmation bias toward the thesis presented.
Regulatory compliance assessment. Debating whether a business practice complies with a regulation. Single agents tend toward optimistic interpretations.

6.5 Cost Governance — When to Use Debate

Given the 4–8× cost multiplier, debate should be triggered only when:

The decision has asymmetric downside (a wrong answer costs significantly more than the debate overhead).
A single-agent answer is available but carries observable uncertainty indicators (hedging language, low confidence score).
The decision type is in the known high-risk category (contract, security, architecture, investment, compliance).
The question is genuinely contested — not a factual lookup where debate adds no value.

Implement a gating function: shouldDebate(taskType, confidenceScore, estimatedDownside) → boolean. Default: debate when estimatedDownside > debateCostThreshold × 50.

7. Implementation Guide

7.1 Step-by-Step

Step 1 — Build the framing agent. The framing step converts a raw user question into a structured proposition that is suitable for debate. Output: a clear thesis statement, the scope of the question, the criteria for a good answer, and the key evidence documents.

Step 2 — Build the proposer agent. Straightforward: instruct the model to make the best case for the proposition using the provided evidence. Include an instruction to pre-emptively address the strongest objections — this forces the proposer to acknowledge weaknesses, making the overall debate more productive.

Step 3 — Build the adversarial critic agent. This is the highest-priority prompt engineering task. Follow the guidelines in Section 6.2. Run the critic against 20 known test cases before production deployment and manually verify that challenge strength scores average above 7.

Step 4 — Build the rebuttal agent. Instruct the rebuttal agent to address each critic argument by number, explicitly concede points where appropriate, and provide specific counter-evidence.

Step 5 — Build the moderator agent. Use a different model than the proposer and critic if possible — different model families have different biases, and a moderator from the same family may exhibit the same biases as the agents it is evaluating. The moderator prompt must include: evaluation rubric, instruction to flag unresolved points explicitly, and instruction to assign a confidence level to the synthesis.

Step 6 — Wire cost governance. Track token spend per round. If cumulative cost reaches the cost ceiling mid-debate, terminate and return the current state of moderation with a DEBATE_TRUNCATED flag.

7.2 Code Skeleton (TypeScript)

interface DebateConfig {
  maxRounds: number;
  costCeilingUSD: number;
  minChallengeStrength: number;
  requireCriticRetryBelow: number;
}

async function runDebate(
  question: string,
  context: string,
  config: DebateConfig = { maxRounds: 2, costCeilingUSD: 0.50, minChallengeStrength: 6, requireCriticRetryBelow: 6 }
): Promise<DebateResult> {
  const debateId = crypto.randomUUID();
  let totalCost = 0;
  const rounds: DebateRound[] = [];

  const proposition = await framingAgent.invoke({ question, context });

  for (let round = 1; round <= config.maxRounds; round++) {
    if (totalCost >= config.costCeilingUSD) {
      return buildResult(rounds, "TRUNCATED_BUDGET");
    }

    const proposerOutput = await proposerAgent.invoke({ question, context, proposition, previousRounds: rounds });
    totalCost += proposerOutput.cost;

    let criticOutput = await criticAgent.invoke({ question, context, proposerOutput, previousRounds: rounds });
    totalCost += criticOutput.cost;

    // Retry critic if challenge strength is too low
    if (criticOutput.challengeStrength < config.requireCriticRetryBelow) {
      criticOutput = await criticAgent.invoke({
        question, context, proposerOutput, previousRounds: rounds,
        additionalInstruction: `Your previous critique was rated ${criticOutput.challengeStrength}/10.
          Rewrite with a stronger adversarial stance. Assume the proposition is dangerously wrong.`
      });
      totalCost += criticOutput.cost;
    }

    const rebuttalOutput = await rebuttalAgent.invoke({ question, context, proposerOutput, criticOutput });
    totalCost += rebuttalOutput.cost;

    const moderatorOutput = await moderatorAgent.invoke({ question, proposerOutput, criticOutput, rebuttalOutput });
    totalCost += moderatorOutput.cost;

    rounds.push({ round, proposerOutput, criticOutput, rebuttalOutput, moderatorOutput, cost: totalCost });

    if (!moderatorOutput.recommendAnotherRound) break;
  }

  return buildResult(rounds, "COMPLETE");
}

8. Observability

8.1 Debate Quality Metrics

Metric	Description	Alert Threshold
Critic challenge strength average	Mean self-reported challenge strength across all debates	< 6.5 (critic is too soft)
Moderator confidence distribution	% of debates returning LOW confidence synthesis	> 30% (questions are systematically too ambiguous)
Explicit uncertainties per debate	Average unresolved points flagged by moderator	Baseline by task type; deviation is signal
Debate cost per task type	Actual token cost vs. estimate	> 1.5× estimate (prompt runaway)
Single-agent agreement rate	% of debates where debate conclusion agrees with the first proposer output	Used for cost-value calibration

8.2 Debate Audit Trail

Every debate run must produce a complete, immutable record: question, framing, all proposer/critic/rebuttal/moderator outputs for each round, token costs, final synthesis, confidence level, and explicit uncertainties. This record is the Model Risk Management (MRM) documentation for the decision.

9. Cost Governance

Selective trigger. Never use debate as the default reasoning mode. Gate on task type and estimated decision value.
Round ceiling. Hard maximum of 3 rounds enforced in the debate runner. The marginal quality improvement of round 4+ rarely justifies the cost.
Model tiering for non-critical roles. Use efficient models for the rebuttal agent (less complex reasoning) and a faster model for moderation. Reserve frontier models for the proposer and critic where argument quality matters most.
Cost ceiling with graceful truncation. If the debate cost ceiling is reached mid-debate, return the current moderation state with a DEBATE_TRUNCATED flag rather than failing silently.
Empirical calibration. Track the correlation between debate use and downstream decision quality (measured by human expert review or outcome). Update the cost-value trigger threshold quarterly.

10. Security Considerations

10.1 Critic Agent Misuse

A deliberately weak critic system prompt could be used to manufacture the appearance of debate scrutiny without providing it — generating a false audit trail that claims rigorous review occurred. Detect by monitoring critic challenge strength scores. Any debate where the critic self-rates below 5 should be flagged for prompt review.

10.2 Moderator Bias Injection

If the moderator's system prompt is modified by an attacker to favour one position, the debate outcome is compromised regardless of argument quality. Protect the moderator prompt as a sensitive configuration artefact — hash it at deployment and verify the hash at runtime.

10.3 Evidence Fabrication

Proposer and critic agents may cite evidence that does not exist in the provided context (hallucination). The moderator's evaluation must include a citation verification step: for each cited piece of evidence, verify it appears in the source context. Flag any uncited or fabricated evidence as a critical validation failure.

11. Failure Modes and Mitigations

Failure Mode	Detection	Mitigation
Lazy critic produces surface-level critique	Challenge strength score below 6	Automatic retry with stronger adversarial instruction
Moderator exhibits positional bias	Audit of moderator outputs over time shows systematic tilt	Use a different model family for moderator; add neutrality checks
Debate oscillates without resolution	Moderator recommends additional rounds repeatedly	Hard round ceiling; return with explicit unresolved points after max rounds
Cost runaway from round explosion	Cost monitoring alert	Hard cost ceiling with DEBATE_TRUNCATED graceful return
Evidence hallucination by proposer or critic	Moderator citation verification fails	Flag fabricated citations; discard the argument that relied on them
Single model bias across all roles	All agents reach same conclusion regardless of role	Use different model providers or model versions for proposer vs critic

12. Compliance and Governance

12.1 Model Risk Management (SR 11-7)

For financial services applications, the debate pattern's audit trail constitutes the formal model validation documentation. The record of all proposer, critic, and moderator outputs demonstrates that the model's output was subjected to structured challenge before being used in a decision. Document the debate parameters (rounds, models used, prompt versions) as part of the model's technical documentation.

12.2 EU AI Act — Robustness (Article 15)

Article 15 requires that high-risk AI systems achieve appropriate levels of accuracy, robustness, and cybersecurity. The debate pattern directly supports the robustness requirement by systematically challenging single-model outputs. Maintain empirical evidence of quality improvement from debate vs. single-agent baselines for each supported task type.

13. Testing Strategy

13.1 Unit Tests

Critic retry logic: mock a critic that returns challenge strength 4; assert the runner triggers a retry with a stronger adversarial instruction.
Cost ceiling: mock agents with known token costs; assert the runner terminates when the ceiling is reached and returns DEBATE_TRUNCATED.
Round ceiling: configure maxRounds: 2 and mock a moderator that always recommends another round; assert debate terminates after 2 rounds.

13.2 Integration Tests

Full debate run on a known test question with canned proposer/critic/rebuttal/moderator responses; assert the output schema matches the debate result structure.
Critic citation fabrication: inject a critic output that cites a source not in the context; assert the moderator flags the fabricated citation.

13.3 Quality Evaluation Tests (Human-Labelled Baseline)

For each supported task type, maintain a golden set of 20 questions with expert-labelled "correct" answers. Run the debate system against this set quarterly. Assert that debate outputs agree with expert labels at a higher rate than single-agent outputs. This is the empirical evidence that debate adds value.

13.4 End-to-End Tests

Submit a contract clause for debate review; assert all four agent outputs are present in the result; assert the moderator output includes explicit confidence level and at least one uncertainty flag; assert total cost is within expected bounds.

14. Variants and Extensions

14.1 Multi-Position Debate

Rather than binary proposition/critique, enumerate three or more positions (e.g., "Adopt technology X," "Adopt technology Y," "Defer the decision") and have a separate proposer agent for each. The moderator evaluates all positions and synthesises a ranked recommendation. Higher cost but more complete for genuinely multi-option decisions.

14.2 Expert Debate (Persona-Prompted)

Assign each agent a specific expert persona (e.g., "You are a senior litigation partner at a global law firm," "You are a chief risk officer at a tier-1 bank"). Expert personas improve the domain-specificity of arguments but require careful persona design to avoid stereotyping.

14.3 Sequential Evidence Introduction

Rather than providing all context upfront, introduce new evidence at each debate round to simulate the information-gathering that occurs in real decisions. Tests how each agent updates its position in response to new information.

15. Trade-off Analysis

Dimension	Debate Agent	Single Agent	Supervisor with Validation
Output quality on contested questions	Highest	Lowest	Moderate
Token cost	4–8× single agent	1×	2–3×
Latency	3–5× single agent	1×	2×
Auditability of reasoning	Highest (full debate record)	Low	Moderate
Sycophancy resistance	Highest	Lowest	Moderate
Use case suitability	High-stakes, contested decisions	Low-stakes, factual	Standard validation use cases

16. Known Implementations

Organisation Type	Use Case	Rounds Used	Reported Quality Improvement
Global law firm	Contract liability clause risk review	2 rounds	34% more risk flags identified vs single-agent; 89% expert agreement rate
Investment bank	Investment thesis challenge	1 round	28% increase in identified downside risks; MRM audit passed first review
Software company	Architecture Decision Record review	1 round	41% of debate-identified concerns confirmed by post-implementation retrospective
Cybersecurity consultancy	Threat modelling	2 rounds	52% more attack vectors identified vs single-agent threat model

Pattern ID	Name	Relationship
EAAPL-MAG001	Multi-Agent Orchestration	Debate can be embedded as a specialist agent type within an orchestration
EAAPL-MAG002	Supervisor Agent	Supervisor can invoke debate between two workers on a contested subtask
EAAPL-MAG003	Human-in-the-Loop Agent	Final human review checkpoint after debate synthesis for regulated use cases
EAAPL-MAG006	Agent Handoff Protocol	Debate round records use handoff schema for inter-agent message passing

18. References

Liang et al., "Encouraging Divergent Thinking in Large Language Models through Debate," 2023 — arxiv.org/abs/2305.19118
Du, Y. et al., "Improving Factuality and Reasoning in LLMs through Multiagent Debate," 2023 — arxiv.org/abs/2305.14325
Irving, G. et al., "AI Safety via Debate," OpenAI, 2018 — arxiv.org/abs/1805.00899
Gartner, "Using Multi-Agent AI for High-Stakes Decision Review," 2025 (ID: G00823891)
Anthropic, "Sycophancy in AI Systems," 2024 — anthropic.com/research/sycophancy
SR 11-7: Guidance on Model Risk Management — federalreserve.gov/supervisionreg/srletters/sr1107.htm
EU AI Act (Regulation 2024/1689), Article 15: Accuracy, Robustness, and Cybersecurity
Chan, C. et al., "ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate," 2023 — arxiv.org/abs/2308.07201
Madaan, A. et al., "Self-Refine: Iterative Refinement with Self-Feedback," 2023 — arxiv.org/abs/2303.17651
NIST AI RMF 1.0, Measure 2.5: Bias, Fairness, and Explainability Testing

Track this pattern for APRA/ASIC review

← Back to Library More Multi-Agent Systems →

EAAPL-MAG005 — Debate Agent

EAAPL-MAG005 — Debate Agent

1. Pattern Identity

2. Executive Summary

3. Problem Statement

3.1 Context

3.2 Forces in Tension

3.3 Failure Modes Without This Pattern

4. Solution

4.1 Debate Architecture

4.2 Moderator Evaluation Flow

5. Structure

5.1 Component Catalogue

5.2 Debate Round Record Schema

6. Behaviour

6.1 Debate Structure

6.2 Critic Design — The Most Important Prompt in the System

6.3 Moderator Design — Evaluating Arguments Not Positions

6.4 Use Cases Where Debate Beats Single Agent

6.5 Cost Governance — When to Use Debate

7. Implementation Guide

7.1 Step-by-Step

7.2 Code Skeleton (TypeScript)

8. Observability

8.1 Debate Quality Metrics

8.2 Debate Audit Trail

9. Cost Governance

10. Security Considerations

10.1 Critic Agent Misuse

10.2 Moderator Bias Injection

10.3 Evidence Fabrication

11. Failure Modes and Mitigations

12. Compliance and Governance

12.1 Model Risk Management (SR 11-7)

12.2 EU AI Act — Robustness (Article 15)

13. Testing Strategy

13.1 Unit Tests

13.2 Integration Tests

13.3 Quality Evaluation Tests (Human-Labelled Baseline)

13.4 End-to-End Tests

14. Variants and Extensions

14.1 Multi-Position Debate

14.2 Expert Debate (Persona-Prompted)

14.3 Sequential Evidence Introduction

15. Trade-off Analysis

16. Known Implementations

17. Related Patterns

18. References