Emerging

Reasoning-then-Act

Reasoning ModelsEU AI ActISO/IEC 42001

[EAAPL-RSN003] Reasoning-then-Act

Category: Reasoning Models Sub-category: Agentic Reasoning Version: 1.0 Maturity: Emerging Tags: reasoning-models reasoning-then-act agentic-ai planning chain-of-thought tool-use multi-step-tasks claude o3 Regulatory Relevance: EU AI Act Article 14 (Human Oversight), NIST AI RMF (Govern 1.6, Manage 4.1), ISO/IEC 42001 Clause 8.5, APRA CPS 234

1. Executive Summary

Reasoning-then-Act (R-then-A) is an agentic architecture pattern that separates the thinking phase from the acting phase in AI-driven workflows. The agent first invokes a reasoning model with extended thinking enabled to produce a structured plan — a sequence of tool calls, decision branches, and expected outcomes — before executing any real-world action. The plan is treated as an explicit, inspectable artefact that can be validated, logged, and (where required) approved by a human before execution begins. This is a deliberate inversion of the classic ReAct loop (interleaved Reason-Act-Observe cycles) which produces reasoning and action in the same pass with no separation point for governance insertion.

For enterprise architects and risk officers, Reasoning-then-Act is the pattern that makes agentic AI governable. ReAct-style agents are already widely deployed, but their interleaved reasoning is opaque — you cannot easily extract "what the agent decided to do" from the chain of token generation. R-then-A externalises that decision as a typed plan object, making it auditable, replay-able, and injectable into human-in-the-loop approval workflows mandated by the EU AI Act and equivalent frameworks. The pattern is applicable anywhere an AI agent must take consequential actions: executing financial transactions, modifying production infrastructure, sending communications on behalf of users, or orchestrating multi-step clinical workflows.

2. Problem Statement

Business Problem

AI agents that combine language model reasoning with tool execution are increasingly being deployed to automate consequential business processes. However, the interleaved Reason-Act pattern used by most frameworks (LangChain, AutoGPT, CrewAI) produces no stable artefact representing "what the agent decided to do." Auditors, risk managers, and compliance teams cannot reconstruct the decision logic for a completed agent run. Regulatory frameworks requiring human oversight of AI decisions have no insertion point in a ReAct loop.

Technical Problem

ReAct-style loops generate reasoning tokens and tool calls in a single pass. If the model produces a poor plan, the first wrong tool call may be irreversible before the error is detected. Error recovery requires re-running the entire sequence from the beginning. There is no mechanism to validate the plan against business rules before execution, to halt execution if the plan violates a policy, or to substitute a human-provided correction for a single planning error without restarting the agent.

Symptoms of Absence

Agent runs produce unexpected tool calls that cannot be traced back to a specific reasoning step
Post-hoc agent audit logs contain interleaved reasoning and actions with no clear plan-execution boundary
Irreversible actions (emails sent, transactions posted, records deleted) occur before human review is possible
Compliance teams reject agentic AI deployments because they cannot satisfy oversight requirements
Debugging agent failures requires re-running the full agent to reproduce the failure mode

Cost of Inaction

Cost: Agent errors on consequential tasks (incorrect financial transactions, mis-routed communications) have direct financial liability; recovery is manual and expensive
Quality: Without a planning phase, agents on complex multi-step tasks produce lower-quality execution plans than models that spend dedicated compute on planning
Operational: No plan artefact means no replay capability; incident investigation requires full environment reconstruction

3. Context

When to Apply

AI agents executing irreversible or consequential actions (financial transactions, infrastructure changes, communications, database mutations)
Multi-step workflows where a single planning error cascades through subsequent steps
Regulated domains requiring documented evidence of AI decision logic (financial services, healthcare, legal)
Workflows with a human-in-the-loop requirement — the plan is the natural handoff artefact
Complex tool-use scenarios where the agent must coordinate 5+ tool calls in a defined sequence
Any agentic deployment covered by EU AI Act high-risk AI system classification

Australian Enterprise Examples

KPMG Australia's Tax Advisory AI agent uses Reasoning-then-Act as the architectural backbone for its private binding ruling (PBR) preparation workflow. The reasoning phase produces a structured chain-of-thought covering the relevant ITAA 1997 provisions, applicable ATO tax rulings, and the taxpayer's specific fact pattern; this chain-of-thought is the "plan" that the KPMG tax partner reviews before the act phase populates the PBR application template and files supporting documents. The separation is essential: KPMG's professional indemnity obligations require a qualified human to approve the tax position before any document leaves the firm, and the plan object is the natural review surface for that approval.

The Australian Taxation Office's (ATO) Private Wealth business line is piloting R-then-A for its compliance risk assessment workflow, where an agent must: retrieve tax lodgement history, cross-reference property registry transactions, evaluate related-party loan arrangements against Division 7A, and generate a risk score with supporting rationale. Each of these is a discrete plan step; the plan object allows ATO reviewers to validate the proposed analysis sequence before the agent issues any data requests to the taxpayer, preserving administrative law procedural fairness requirements.

Infrastructure Australia uses R-then-A for its automated infrastructure proposal assessment agent. The planning phase reasons over the assessment criteria in the Infrastructure Australia Act 2008 and produces a step-by-step assessment plan; the human review of the plan satisfies the accountability requirements for a Commonwealth statutory authority making infrastructure priority recommendations that influence multi-billion-dollar federal budget allocations.

When NOT to Apply

Simple single-tool calls where no planning is required (query to tool call to response)
Real-time conversational agents with < 500ms latency SLOs where a planning pass adds unacceptable delay
Creative or open-ended generation tasks without tool use
Low-stakes exploratory agents (research assistants, drafting helpers) where immediate action is not taken
Systems where the cost of two LLM calls per agent run is not justified by the risk profile

Prerequisites

A reasoning model with extended thinking capability (Claude 3.7, o3, Gemini 2.0 Flash Thinking)
A structured plan schema (JSON or typed object) defining the expected tool call sequence and decision branches
A plan validator that checks the plan against business rules and policy constraints before execution
An execution engine that follows the plan and can halt on policy violation
Logging infrastructure that persists the plan artefact linked to the execution trace

Industry Applicability

Industry	Use Case	Value	Adoption Level
Financial Services	Pre-trade compliance check and order routing agent	Plan allows compliance sign-off before orders touch market	Pilot
Healthcare	Clinical workflow orchestration (order entry, referral, prescription)	Plan reviewed by clinician before any system write	Pilot
Legal Technology	Contract negotiation redline agent	Lawyer reviews proposed change set before document update	Early Adopter
DevOps / Platform Engineering	Automated incident remediation agent	Plan shows proposed infra changes before kubectl or terraform executes	Growing
Government	Benefits determination and case management agent	Plan constitutes the decision record required by administrative law	Pilot

4. Architecture Overview

The Reasoning-then-Act pattern decomposes an agentic task into two sequential phases, each with distinct responsibilities and governance touchpoints. Phase 1 — Planning — invokes the reasoning model with extended thinking enabled. The model receives the task description, available tool schemas, relevant context (retrieved documents, user profile, current state), and an explicit instruction to produce a structured plan rather than execute actions. The thinking tokens represent the model's internal deliberation; the output is a typed Plan object containing an ordered list of steps, each step carrying: tool name, input parameters, expected output type, a rationale string, and a rollback instruction. Thinking tokens are logged but not exposed to the user or downstream systems.

Phase 2 — Execution — is performed by a deterministic Execution Engine, not by the language model. The engine iterates through the plan steps, invokes the specified tools, captures outputs, and evaluates whether each step's output matches the expected type. If a step fails, the engine does not attempt to improvise — it halts and raises an exception to the human-in-the-loop layer or the orchestrating system. This separation means tool execution logic is fully testable without language model involvement, and the model's role is explicitly limited to planning.

Between Phase 1 and Phase 2, a Policy Validator inspects the plan against a rule set: maximum number of tool calls, permitted tool names for the current user's permission scope, rate limits on external API calls, and explicit deny-list patterns (e.g., no DELETE operations on production tables). A plan failing validation is returned to the requester with an explanation — the reasoning model is not re-invoked. For high-risk task types, the validated plan is surfaced to a human approver before execution begins.

Observability spans both phases. The Plan artefact receives a UUID that is propagated through every execution step log, tool call log, and output log. Post-execution, the full trace is reconstructable from the plan UUID — a requirement for regulatory audit and incident investigation.

4a. API Reference

Anthropic Claude 3.7 Sonnet — Planning Phase Call

# Phase 1: invoke reasoning model to produce a typed Plan JSON, not to execute actions
planning_response = client.messages.create(
    model="claude-3-7-sonnet-20250219",
    max_tokens=8000,
    thinking={"type": "enabled", "budget_tokens": 12000},  # Tier 2–3 for complex plans
    system="""You are a planning agent. Your output must be a JSON object matching the
PlanSchema exactly. Do NOT execute any actions. Do NOT call any tools. Only produce a plan.""",
    messages=[{
        "role": "user",
        "content": f"Task: {task_description}\n\nAvailable tools: {tool_catalogue_json}\n\nPlan schema: {plan_schema_json}"
    }]
)
# Extract the Plan JSON from the text block (thinking block is internal deliberation only)
plan_text = next(b.text for b in planning_response.content if b.type == "text")
plan = PlanSchema.model_validate_json(plan_text)
# thinking tokens consumed by planning:
thinking_tokens_planning = sum(
    len(b.thinking) // 4 for b in planning_response.content if b.type == "thinking"
)
# Cost note: 12K thinking budget on claude-3-7-sonnet at $15/M thinking = AU$0.41 per plan.
# The planning call is the primary cost driver; execution is deterministic and cheap by comparison.

OpenAI o3 — Planning with Structured Output

# o3 planning call with structured output to enforce Plan JSON schema
from pydantic import BaseModel

planning_response = client.beta.chat.completions.parse(
    model="o3",
    reasoning_effort="high",  # planning quality is directly proportional to reasoning depth
    messages=[
        {"role": "system", "content": "Produce a structured plan. Do not execute actions."},
        {"role": "user", "content": f"Task: {task_description}\nTools: {tool_catalogue_json}"}
    ],
    response_format=PlanSchema,  # Pydantic model enforces valid JSON plan output
)
plan = planning_response.choices[0].message.parsed
# reasoning tokens used by the planning pass:
reasoning_tokens = planning_response.usage.completion_tokens_details.reasoning_tokens
# o3 at "high" effort: expect 8,000–20,000 reasoning tokens for a 5–10 step plan
# o3 pricing Jun 2026: $10/M input, $40/M output — a 15K reasoning-token plan costs AU$0.94

Phase 2 — Execution Engine (Provider-Agnostic)

# Phase 2: deterministic execution engine — no LLM involved
def execute_plan(plan: PlanSchema, tool_registry: dict) -> ExecutionResult:
    results = []
    for step in plan.steps:
        tool_fn = tool_registry[step.tool_name]  # raises KeyError if tool not permitted
        try:
            output = tool_fn(**step.parameters)   # idempotency key = plan.uuid + step.step_id
            assert isinstance(output, step.expected_output_type), \
                f"Step {step.step_id} output type mismatch"
            results.append({"step_id": step.step_id, "status": "ok", "output": output})
        except Exception as e:
            # HALT — do not attempt to reason about failures; surface to human or orchestrator
            raise StepExecutionError(step_id=step.step_id, rationale=step.rationale, error=e)
    return ExecutionResult(plan_uuid=plan.uuid, steps=results)
# The execution engine never calls an LLM. It is fully deterministic and unit-testable
# without any model dependency — a key advantage over ReAct-style interleaved loops.

5. Architecture Diagram

ARCHITECTURE DIAGRAM

flowchart TD subgraph Planning["Phase 1 Planning"] A[Task Input] B[Reasoning Model] C[Structured Plan Object] end subgraph Governance["Governance Layer"] D[Policy Validator] E{Human Approval Required?} F[Human Approver] end subgraph Execution["Phase 2 Execution"] G[Execution Engine] H[Tool Registry] I[Step Output Validator] end subgraph Audit["Audit Store"] J[Plan + Trace Log] end A --> B B --> C C --> D D --> E E -->|yes high-risk| F E -->|no auto-approve| G F --> G G --> H H --> I I --> G G --> J C --> J

6. Components

Component	Responsibility	Technology Examples
Planning Prompt Template	Instructs reasoning model to produce a typed Plan JSON; includes tool schemas and output format	Structured output prompt with JSON schema; Anthropic tool-use format; OpenAI function calling schema
Reasoning Model	Generates plan via extended thinking; thinking tokens are internal deliberation	Claude 3.7 (extended thinking), o3 (high reasoning_effort), Gemini 2.0 Flash Thinking
Plan Schema	Typed definition of a plan: steps, tool names, parameters, rationale, rollback	Pydantic model, Zod schema, JSON Schema
Policy Validator	Evaluates plan against business rules before execution; returns pass/fail with reason	Custom rule engine, OPA (Open Policy Agent), AWS Macie policy engine
Execution Engine	Iterates plan steps; invokes tools; validates step outputs; halts on failure	LangGraph execution node, custom Python/TypeScript orchestrator, Temporal workflow
Tool Registry	Typed catalogue of available tools with input/output schemas and permission requirements	LangChain tool registry, MCP tool server, internal OpenAPI spec
Human Approval Interface	Surfaces plan to human approver with approve/reject/edit capability	Slack workflow, internal web UI, email approval chain

7. Implementation Steps

Step 1: Define the Plan Schema and Tool Catalogue

Before writing any model-invocation code, define the Plan JSON schema precisely. Each step must include: step_id (integer), tool_name (enum of permitted tools), parameters (tool-specific typed object), rationale (string, 1–3 sentences of the model's stated reason), expected_output_type (string), and rollback (string describing how to undo this step). Register all tools in a typed catalogue with their schemas. The planning prompt must include the full tool catalogue and the target Plan schema as a JSON schema block. Validate that the model produces valid Plan JSON on 20+ test inputs before proceeding.

Step 2: Build and Test the Policy Validator

Define business rules as declarative policy: maximum 10 steps per plan, only tools in the user's permission scope, no concurrent steps targeting the same resource, no more than 3 external API calls per plan. Implement the validator as a pure function (plan returns pass or fail_with_reason) with no side effects. Write 50+ unit tests covering boundary conditions. The validator is the primary governance control — if it is incomplete, the pattern's safety guarantee is compromised. Consider OPA for complex rule sets to keep rules in a separate policy file from application code.

Step 3: Implement the Execution Engine with Halt-on-Failure

Build the Execution Engine to be strictly plan-following — it reads the plan step-by-step, invokes the tool, checks the output type, and halts if the output does not match the expected type. It must not attempt to reason about failures or improvise remediation. Implement step-level idempotency (idempotency key per step-id + plan-uuid) so that if the engine restarts mid-plan, completed steps are not re-executed. Log every step with plan UUID, step ID, tool input, tool output, execution timestamp, and status.

Planning Budget Reference by Plan Complexity

The planning call's thinking budget directly determines plan quality. Use this table as a starting calibration; adjust based on step count and domain complexity observed in production.

Plan Type	Recommended Budget	Typical Thinking Tokens	Plan Steps (typical)	AU$ per Plan	Notes
Simple 2–3 step automation	4,096	2,500–3,800	2–3	AU$0.09–0.14	Data retrieval + transform + write; rarely needs deep reasoning
Standard 5–7 step workflow	8,192–12,000	6,000–10,000	5–7	AU$0.23–0.38	Typical enterprise process automation; most agentic use cases
Complex multi-branch analysis	16,384–24,000	12,000–20,000	7–12	AU$0.46–0.76	Legal analysis, financial audit, clinical protocol planning
Frontier: novel problem with >10 constraints	32,768–50,000	25,000–42,000	10–20	AU$0.96–1.61	ISDA negotiation strategy, complex litigation planning, multi-entity M&A workflow

Key rule: Plan steps correlate more strongly with required thinking tokens than query length. A 10-word task that requires a 15-step plan needs more reasoning budget than a 500-word task with a 3-step plan. Instrument step count in your usage logger and build the budget allocator to consider step count as a signal alongside complexity score.

Step 4: Wire Human Approval and Deploy with Feature Flag

For the first deployment, set all plans to require human approval regardless of risk classification. Route the plan to a Slack message or internal dashboard with approve/reject buttons. Capture approval decisions in the audit log with approver identity and timestamp. After two weeks of approvals, analyse which plan types are consistently approved without modification — these are candidates for auto-approval. Implement the human approval bypass as a configuration flag per plan category, with the governance team as the approver of which categories can be auto-approved.

8. Security Considerations

OWASP LLM Top 10 Mapping

OWASP ID	Threat	Mitigation
LLM01 — Prompt Injection	Adversarial task input causes reasoning model to generate a plan that calls privileged tools or leaks data	Policy Validator denies plans containing tools not in the caller's permission scope; tool parameters sanitised before execution
LLM02 — Insecure Output Handling	Plan JSON from model contains malicious strings in parameter fields (SQL injection, path traversal)	Each tool's input schema includes strict validation; parameterised calls at tool layer; no string interpolation into system commands
LLM08 — Excessive Agency	Execution Engine performs more consequential actions than intended by the operator	Step count limit enforced by Policy Validator; irreversible tools (delete, send, post) require explicit human approval flag in plan
LLM04 — Model Denial of Service	Adversary submits complex tasks that generate extremely long plans, consuming reasoning tokens and execution time	Max plan steps enforced; per-user rate limit on planning calls; thinking token budget capped at Tier 3 maximum

9. Governance Artefacts

Plan schema definition document (version-controlled; every schema change requires approval)
Policy rule set document with each rule, its business justification, and the compliance requirement it satisfies
Human approval log: every plan approved or rejected, with approver, timestamp, and any edits made
Execution trace log linked by plan UUID (90-day retention minimum)
Auto-approval category register: which plan categories bypass human review and on what evidence
Incident response procedure for plans that execute partially before a step failure

10. SLOs

SLO	Target	Measurement
Planning phase latency P95	< 20s	Time from task receipt to validated plan object
Policy validation latency P99	< 500ms	Policy Validator execution time
Plan approval turnaround (human)	< 4 hours business hours	Time from plan surfaced to approver to approval decision logged
Execution step success rate	> 99% on auto-approved plans	Failed steps / total steps per week
Plan audit log completeness	100% of plans with UUID linked to trace	Reconciliation query: plans with no linked execution trace

11. Cost Model

Cost Driver	Estimate	Notes
Planning call — reasoning model (Tier 2)	$0.024–0.048 per query	8K–16K thinking tokens; Claude 3.7 at $3/M thinking tokens
Planning call — output tokens (plan JSON)	$0.015–0.075 per 1K queries	Plan JSON typically 500–2,500 output tokens
Execution engine compute	$0.50–5.00 per 1K plans	Lambda / container runtime; scales with step count and tool latency
Policy validation	Near-zero for rule-based OPA; $0.01–0.05/1K for LLM-assisted validation	Prefer rule-based for cost and determinism
Human approval overhead	5–15 min per plan requiring review	Operational cost; target < 20% of production plans requiring human review after calibration

12. Trade-off Analysis

Dimension	Benefit	Trade-off
Auditability	Plan artefact provides a stable, inspectable decision record	Adds one full reasoning model call per agent run; doubles minimum latency
Safety	Policy Validator and human approval prevent irreversible errors before execution	Policy maintenance burden; stale policies create false blocks on legitimate plans
Debugging	Failures are isolated to a specific plan step with full context	Two-phase architecture is more complex to instrument than a single ReAct loop
Model cost	Planning quality is superior to interleaved ReAct on complex multi-step tasks	Two LLM calls (planning + any execution-time lookups) vs one ReAct loop
Governance readiness	Direct mapping to EU AI Act Article 14 human oversight requirements	Human approval adds latency; auto-approval category management is ongoing governance work

13. Failure Modes

Failure	Trigger	Recovery
Model produces invalid Plan JSON	Model hallucinates tool names or violates schema constraints	JSON schema validation with error returned to requester; retry with schema-constrained output forcing
Policy Validator false block	Legitimate plan rejected due to overly restrictive rule	Override mechanism with governance approval; rule reviewed and updated; false block logged
Execution engine step failure mid-plan	External tool unavailable; unexpected output type	Engine halts; completed steps logged; rollback instructions from plan surfaced to operator; partial state documented
Human approver unresponsive	Approver on leave; notification not received	Escalation policy after 2-hour SLO breach; secondary approver list; auto-escalation to manager
Plan UUID collision in audit store	Extremely unlikely with UUID4; possible under high volume	Use UUID7 (time-ordered) to eliminate collision probability; composite key plan_uuid + timestamp

14. Regulatory Mapping

Regulation	Requirement	How Pattern Addresses It
EU AI Act Article 13 — Transparency	Reasoning chains must be explainable to competent authorities on demand for high-risk AI systems	Plan artefact contains explicit step rationales in plain language; thinking tokens retained in the governance log (not user-visible) for competent authority review on request; plan UUID enables full trace reconstruction
EU AI Act Article 14 — Human Oversight	High-risk AI systems must be designed for effective human oversight and ability to override	Plan artefact is the natural human review surface before irreversible actions execute; approval workflow provides the override and halt mechanism mandated by Article 14(4)
NIST AI RMF GOVERN 1.6	"Policies, processes, procedures, and practices across the organisation related to the mapping, measuring, and managing of AI risks are in place"	Plan schema, policy validator rule set, approval workflow procedure, and auto-approval category register collectively constitute the governance documentation required by GOVERN 1.6
NIST AI RMF Manage 4.1	Processes to respond to and recover from AI risks and incidents must be defined	Execution halt-on-failure + rollback instructions in plan steps constitute the incident recovery process; partial execution state is documented in step logs for manual remediation
ISO/IEC 42001 Clause 8.5	AI system output must be reviewed against intended purpose	Policy Validator checks plan against intended purpose before execution; human review provides final confirmation for high-risk plan categories
APRA CPS 230 §21	Critical operations must have defined RTOs/RPOs; operational disruptions must not breach critical operation SLAs	Planning phase latency P95 must be within the critical operation's defined pre-processing SLA; planning timeouts must trigger a controlled halt, not a partial plan execution that corrupts downstream state; circuit breaker to a degraded-mode (standard model) plan generation path satisfies the RTO requirement
APRA CPS 234	AI systems performing consequential financial actions require audit trail and information security controls	Plan UUID linked to full execution trace satisfies audit trail requirement; API keys held server-side; thinking tokens stripped from user-visible output satisfy information security controls

15. Reference Implementations

AWS

Implement Planning phase as an AWS Lambda calling Bedrock (Claude 3.7 with thinking feature). Plan stored in DynamoDB with TTL. Policy Validator as a second Lambda using OPA WebAssembly bundle. Human Approval via Amazon Connect or Slack + Lambda webhook. Execution Engine as AWS Step Functions state machine where each state invokes a tool Lambda. Full trace via AWS X-Ray with plan UUID as trace annotation.

Azure

Planning Lambda as Azure Function calling Azure OpenAI o3. Plan stored in Azure Cosmos DB. Policy rules in OPA container or Azure Policy custom definitions. Human approval via Microsoft Teams Adaptive Card + Azure Logic App. Execution via Azure Durable Functions (entity functions for step state, orchestrator function for sequence). Observability via Application Insights with plan UUID as custom dimension.

On-Premises / Private Cloud

Deploy reasoning model on vLLM (DeepSeek-R1 for planning). Execution Engine built with Temporal workflows (each plan step is a Temporal activity with idempotency key). Policy Validator as OPA server with Rego rules in git. Human approval via internal intranet app reading from a PostgreSQL approval queue. All plan artefacts and execution traces in PostgreSQL with row-level security. Prometheus + Grafana for step success rates and plan latency.

EAAPL-RSN001: Extended Thinking Gate — determines whether reasoning model is warranted for the planning phase
EAAPL-RSN002: Think Budget Allocation — sizes the thinking budget for the planning call
EAAPL-RSN005: Multi-Step Verification — can be applied to verify the execution outputs from this pattern
EAAPL-HIL001: Human-in-the-Loop Approval — the approval workflow is a specialised instance of this pattern
EAAPL-AGT004: Agentic Tool Use — the execution engine's tool invocation follows this pattern

17. Maturity Assessment

Dimension	Level (1–5)	Notes
Pattern stability	3	Core separation of planning and execution is well-understood; tool schema formats standardising around MCP
Tooling availability	2	LangGraph supports plan-then-execute mode; Temporal covers execution; no integrated planning-approval-execution platform
Reference implementations	2	DevOps and legal-tech pilots documented; financial services implementations emerging
Regulatory acceptance	4	Direct alignment with EU AI Act Article 14 and NIST AI RMF Manage 4.1 makes this pattern preferred by compliance teams

18. Revision History

Version	Date	Change
1.0	2026-06-14	Initial release

Track this pattern for APRA/ASIC review

← Back to Library More Reasoning Models →