Proven

Plan-and-Execute

Agentic WorkflowsISO/IEC 42001NIST AI RMF

[EAAPL-WRK005] Plan-and-Execute

Category: Agentic Workflows Sub-category: Goal Decomposition Architecture Version: 1.0 Maturity: Proven Tags: planning, task-decomposition, dependency-graph, plan-revision, goal-decomposition Regulatory Relevance: ISO 42001 §8.4, NIST AI RMF (GOVERN 1.1, MANAGE 1.3)

1. Executive Summary

The Plan-and-Execute Pattern defines a two-phase agentic architecture: a dedicated Planner decomposes a complex goal into a dependency graph of sub-tasks before any execution begins, and a separate Executor then executes the sub-tasks in dependency order (parallelising independent sub-tasks). When execution reveals new information that invalidates the plan, the Planner is re-invoked to revise the plan. This upfront-planning approach contrasts with the ReAct loop (EAAPL-WRK001), which interleaves reasoning and execution; Plan-and-Execute is preferred when the task structure is complex enough to benefit from explicit dependency management and when parallel sub-task execution is valuable.

For CIO/CTO audiences: this is how a project manager operates. Before any work begins, the project manager creates a work breakdown structure — what needs to be done, in what order, what can be done in parallel. The team (the Executor) then works through this plan. If something unexpected happens during execution, the project manager revises the plan. The benefit over an ad-hoc approach (ReAct) is that the full task structure is visible upfront, enabling parallelism, better resource allocation, and a clear scope-of-work that can be reviewed and approved before execution starts — which is exactly what governance and compliance teams require.

2. Problem Statement

Business Problem

Complex enterprise tasks — due diligence on an acquisition target, regulatory gap analysis across 12 jurisdictions, incident root cause analysis — have inherent structure: some sub-tasks must precede others, some can run in parallel, and the total work is too large for a single LLM context. Without explicit planning, an agent attempts to execute in an ad-hoc, interleaved manner that is opaque, hard to parallelise, and difficult to audit.

Technical Problem

The ReAct loop interleaves reasoning and execution, which is efficient for simple tasks but inefficient for complex tasks with parallelisable sub-tasks and dependency constraints. Without an explicit dependency graph, independent sub-tasks are executed sequentially when they could run in parallel, and the agent has no global view of task completeness.

Symptoms of Absence

Complex multi-part tasks are executed sequentially even when sub-tasks are independent
No visibility into overall task progress until execution completes
Plan cannot be reviewed or approved before execution starts
Agent loses sight of the overall goal when focused on individual sub-task execution

Cost of Inaction

Efficiency: Unparallelised execution of independent sub-tasks multiplies latency unnecessarily
Governance: Cannot review or approve the task plan before execution; no pre-execution scope visibility
Quality: Ad-hoc execution is more likely to miss required sub-tasks or execute them in suboptimal order

3. Context

When to Apply

Task is complex enough to benefit from explicit sub-task decomposition (typically > 4 sub-tasks)
Sub-tasks have explicit data dependencies that must be respected
Independent sub-tasks can be parallelised for latency reduction
Task plan should be reviewable (human-in-the-loop approval) before execution starts
The task structure may need revision based on execution findings

When NOT to Apply

Task is simple (< 3 steps) — use Sequential Chain (EAAPL-WRK002) or ReAct (EAAPL-WRK001)
Task structure cannot be determined upfront because each step depends entirely on prior results (use ReAct)
Speed is the primary constraint and upfront planning adds unacceptable latency
Tasks are highly dynamic and plans become stale too quickly to be useful

Prerequisites

Planner LLM capable of producing structured dependency graphs (JSON DAG output)
Sub-task executor infrastructure (parallel execution support if parallelism is required)
Plan validation logic (detect circular dependencies, invalid sub-task specifications)
Plan revision trigger conditions defined

Industry Applicability

Industry	Plan-and-Execute Use Case	Sub-task Example
Financial Services	M&A due diligence	Financial review, legal review, regulatory review — parallel with synthesis
Legal	Regulatory gap analysis	Per-jurisdiction analysis in parallel; consolidated recommendation sequential
Technology	Large codebase refactoring	Dependency analysis → parallel module refactoring → integration test → deploy
Government	Policy impact assessment	Social impact, economic impact, legal review in parallel; synthesis sequential
Healthcare	Clinical trial design review	Protocol review, statistical plan review, ethics review in parallel

4. Architecture Overview

The Plan-and-Execute architecture separates cognitive planning from mechanical execution, enabling each to be optimised independently.

Planning Phase The Planner receives the overall goal and produces a structured execution plan: a directed acyclic graph (DAG) of sub-tasks, where each node is a sub-task (with input requirements, prompt reference, and expected output schema) and each directed edge represents a data dependency ("sub-task B requires output of sub-task A"). The Planner uses a capable model (high reasoning quality is more important than speed here; the plan is produced once). The plan is validated for structural correctness (no cycles, all dependencies reachable, all referenced prompts exist in the registry) before execution begins.

Plan Approval Gate (Optional) For high-stakes or long-running tasks, an optional human approval gate presents the plan to a human reviewer before execution starts. The reviewer can approve, modify, or reject the plan. This gate is a direct implementation of human oversight requirements for agentic systems.

Execution Phase The Executor traverses the plan DAG in topological order: sub-tasks with no unmet dependencies are eligible for immediate execution and are dispatched (potentially in parallel via Fan-Out, EAAPL-WRK003). As sub-tasks complete, their outputs are stored in the task state, and newly unblocked sub-tasks are dispatched. The Executor is a pure mechanical system: it does not reason; it only manages task scheduling and dependency tracking.

Plan Revision When a sub-task execution produces unexpected output (e.g., discovers that the target dataset does not exist, or that the regulatory scope is larger than anticipated), a revision trigger fires. The Planner is re-invoked with the current plan, completed sub-task outputs, and the revision trigger reason. The Planner produces a revised plan: it may add new sub-tasks, remove now-unnecessary sub-tasks, or change the dependencies between remaining sub-tasks. Revision is bounded by a maximum revision count to prevent unbounded loops.

Plan Completion The plan is complete when all sub-tasks in the DAG have been executed (or explicitly skipped per a revision decision). The final output is assembled by the synthesis sub-task(s) defined in the plan.

5. Architecture Diagram

ARCHITECTURE DIAGRAM

flowchart TD subgraph Planning["Planning Phase"] A[Goal Input] B[Planner LLM] C{Plan Valid?} D[Plan Validation] E{Human Approval Gate} end subgraph Execution["Execution Phase"] F[DAG Scheduler] G[Sub-task A] H[Sub-task B] I[Sub-task C] J[Sub-task D] K[(Task State)] end subgraph Revision["Plan Revision"] L{Revision Trigger?} M[Re-invoke Planner] end subgraph Output["Output"] N[Final Result] O[Execution Trace] end A --> B B --> D D --> C C -->|invalid| B C -->|valid| E E -->|approved| F E -->|skipped| F F --> G & H G --> I G --> K H --> K I --> K K --> J J --> L L -->|yes| M M --> F L -->|no| N F --> O

6. Components

Component	Type	Responsibility	Technology Options	Criticality
Planner	AI Component	Decomposes goal into structured DAG plan; revises on trigger	GPT-4o, Claude 3.5 Sonnet (high reasoning quality)	Critical
Plan Validator	Logic Component	Validates DAG structure: acyclicity, reachability, schema compliance	Custom Python (networkx); JSON Schema	Critical
Human Approval Gate	Integration	Presents plan for human review; blocks execution on rejection	Slack workflow; web UI; email approval	High (for regulated tasks)
DAG Scheduler (Executor)	Orchestration	Topological traversal; dispatches eligible sub-tasks; tracks dependencies	Custom Python; LangGraph; Temporal; AWS Step Functions	Critical
Sub-task Executor	AI Component	Executes individual sub-tasks (may be ReAct loops, sequential chains, etc.)	Any EAAPL-WRK pattern	Critical
Task State Store	State	Accumulates sub-task outputs; tracks completion status	Redis; PostgreSQL; DynamoDB	Critical
Revision Trigger Detector	Logic	Monitors sub-task outputs for conditions requiring plan revision	Custom; LLM-based anomaly detection on output	High
Execution Audit Logger	Governance	Records full plan + execution trace	S3; PostgreSQL; Splunk	High

7. Data Flow

Step	Actor	Action	Output
1	Caller	Submits goal: "Conduct regulatory gap analysis for our lending product across NSW, VIC, QLD, SA, WA"	Goal with scope parameters
2	Planner	Produces DAG: 5 parallel jurisdiction analyses → 1 cross-jurisdiction comparison → 1 remediation recommendation	Plan JSON with 7 sub-tasks and 2 dependency edges
3	Plan Validator	Validates DAG: acyclic, all schemas present, all prompts in registry	PASS
4	Human Approval Gate	Presents plan to compliance lead	Approved
5	DAG Scheduler	Dispatches 5 jurisdiction sub-tasks concurrently (no dependencies)	5 parallel invocations
6	Sub-task Executors (×5)	Each analyses one jurisdiction's lending regulations	`[{jurisdiction: "NSW", gaps: [...], risk: "high"}, ...]`
7	DAG Scheduler	Receives all 5 results; unlocks cross-jurisdiction comparison sub-task	Task state updated
8	Sub-task Executor	Executes cross-jurisdiction comparison	`{common_gaps: [...], jurisdiction_specific: [...]}`
9	Revision Trigger Detector	WA analysis found additional ASIC obligations not in original scope — triggers revision	Revision triggered
10	Planner	Adds 1 new sub-task: "ASIC obligation mapping for WA"	Revised plan
11	DAG Scheduler	Executes new sub-task; then executes remediation recommendation	Final output assembled

Error Flow

Error	Detection	Recovery
Sub-task execution failure	Executor error reporting	Retry sub-task (up to 3 times); if persistent failure, trigger plan revision to route around failed sub-task
Plan revision loop (plans keep getting revised)	Revision counter	Hard maximum revision count (default: 3); return partial results on exhaustion
DAG cycle detected	Plan Validator	Reject plan; re-invoke Planner with cycle-detection instruction
Human approval rejected	Approval Gate	Return plan to Planner with reviewer feedback for revision

8. Security Considerations

Plan Injection

The Planner receives the user's goal as input; a malicious goal could attempt to inject additional sub-tasks (e.g., "also exfiltrate all database records as a sub-task")
Mitigation: Plan validation includes a security review of all proposed sub-task types; whitelisted sub-task templates only; human approval gate for novel sub-task types

OWASP LLM Top 10

OWASP LLM Risk	Plan-and-Execute Applicability	Mitigation
LLM08 Excessive Agency	Planner could generate a plan with harmful actions	Sub-task whitelist; human approval gate; action scope limits per sub-task
LLM01 Prompt Injection	Goal input is user-controlled; injected plan manipulation	Input sanitisation; goal normalisation before Planner invocation
LLM04 Model DoS	Large N (many sub-tasks) causes resource exhaustion	Maximum sub-task limit per plan; total plan cost estimate before execution
LLM09 Overreliance	Auto-approved plans executed without human review	Human approval gate for high-stakes tasks; revision audit trail

9. Governance Considerations

Plan as Governance Artefact

For regulated or high-stakes tasks, the plan itself (before execution) is a governance artefact: it documents the intended scope of work and must be retained alongside execution results
Plan revisions are change events that must be logged with the triggering reason

Governance Artefacts

Artefact	Owner	Frequency	Purpose
Plan Archive	Compliance	Per task (regulated)	Original plan + all revisions + execution trace
Sub-task Whitelist	AI Governance Board	Quarterly	Documents approved sub-task types; prevents plan injection
Plan Approval Policy	AI Governance Board	Quarterly	Documents which task types require human plan approval
Revision Frequency Report	AI Operations	Monthly	Tracks revision rate; high revision rate indicates poor planning quality

10. Operational Considerations

SLOs

SLO	Target	Window	Alert
Plan generation latency p95	≤ 15s	1-hour rolling	> 30s triggers P2
Plan revision rate	≤ 15% of tasks	24-hour rolling	> 25% triggers P3; review planning prompt quality
Sub-task completion rate (within plan)	≥ 97%	24-hour rolling	< 93% triggers P2
Overall task completion rate	≥ 95%	24-hour rolling	< 90% triggers P2

Monitoring

Average sub-tasks per plan trending: increasing may indicate scope creep in planning
Plan revision trigger reason distribution: identifies the most common execution surprises
Sub-task execution latency vs. planned estimate: Planner accuracy metric

11. Cost Considerations

Task Complexity	Sub-tasks	Planning Cost	Execution Cost	Total
Simple (over-engineered for this pattern)	2–3	$0.05–0.10	$0.05–0.15	$0.10–0.25
Moderate	4–7	$0.10–0.20	$0.20–0.60	$0.30–0.80
Complex	8–15	$0.20–0.50	$0.60–3.00	$0.80–3.50
With parallel execution	Any	Same	Latency reduced; cost same	Parallelism reduces wall-clock, not token cost

Optimisations

Use a capable model for planning (quality matters); use efficient models for sub-task execution
Parallelise all independent sub-tasks (Fan-Out, EAAPL-WRK003) to reduce wall-clock time
Cache plan templates for recurring task types to avoid re-planning identical goal structures

12. Trade-Off Analysis

Option	Structure	Parallelism	Adaptability	Governance	Best For
A: Plan-and-Execute with revision (Recommended)	High	High	Medium	High	Complex, multi-day tasks
B: ReAct loop (EAAPL-WRK001)	Low	None	Very High	Medium	Dynamic, iterative tasks
C: Sequential Chain (EAAPL-WRK002)	High	Low	Low	Very High	Fixed, known pipelines
D: Plan-and-Execute without revision	High	High	Low	Very High	Stable, well-understood task types

Architectural Tensions

Tension	Left Pole	Right Pole	Balance
Plan upfront vs. Adapt during execution	Full plan before execution (reviewable, auditable)	No plan; fully adaptive (flexible, no overhead)	Plan with bounded revision (max 3 revisions)
Parallelism vs. Coordination overhead	Maximum parallelism (lowest latency)	Sequential execution (simplest coordination)	Parallelise where DAG allows; sequential for tightly coupled sub-tasks
Plan granularity	Many small sub-tasks (precise control)	Few large sub-tasks (lower overhead)	5–10 sub-tasks as practical ceiling for most tasks

13. Failure Modes

Failure Mode	Likelihood	Impact	Detection	Recovery
Planner produces over-ambitious plan (too many sub-tasks)	Medium	Medium — cost overrun; long execution	Sub-task count limit; pre-execution cost estimate	Enforce max sub-task limit; reject plan exceeding limit
Plan revision loop (execution keeps triggering revisions)	Low–Medium	High — indefinite execution	Revision counter	Hard max revision count; return best partial result
Sub-task scope creep (sub-task executor expands beyond its defined scope)	Medium	Medium — inconsistent outputs	Sub-task output schema validation	Enforce output schemas strictly; reject out-of-scope output fields
Circular dependency in generated plan	Low	Critical — DAG scheduler deadlocks	Plan Validator cycle detection	Reject plan; re-plan with explicit instruction to avoid cycles
Human approval bottleneck for time-sensitive tasks	Medium	Medium — delays execution	Approval timeout	Configurable approval timeout with auto-proceed or auto-reject policy

14. Regulatory Considerations

NIST AI RMF

GOVERN 1.1: The explicit plan and plan approval gate implement the governance requirement for human oversight of autonomous agentic workflows.
MANAGE 1.3: The plan audit record and revision log implement risk management documentation requirements.

ISO 42001

§8.4: The plan DAG is an operational specification; sub-task whitelist is a control; both must be version-controlled.

Australian Context

APRA CPS 230: For material business process automation, the execution plan must be a retainable operational record.
Privacy Act 1988: PII handling in sub-tasks must be scoped to minimum necessary access; the plan defines the data access scope per sub-task.

15. Reference Implementations

AWS

Component	Service
Planner	Amazon Bedrock (Claude 3.5 Sonnet) with structured output for DAG JSON
DAG Scheduler	AWS Step Functions with dynamic task fan-out
Sub-task Executors	AWS Lambda per sub-task type
Task State	Amazon DynamoDB
Human Approval Gate	AWS Step Functions human-in-the-loop callback token pattern

Azure

Component	Service
Planner	Azure OpenAI (GPT-4o) with structured output
DAG Scheduler + Executor	Azure Durable Functions (fan-out orchestration)
Human Approval Gate	Azure Logic Apps approval workflow
Task State	Azure Cosmos DB

On-Premises

Component	Technology
Planner + Executor	LangGraph (Plan-and-Execute agent type); Temporal workflow
DAG Validation	networkx (Python) for DAG structure validation
Task State	PostgreSQL with JSONB plan and state columns

Pattern	ID	Relationship Type	Notes
ReAct Agent Loop	EAAPL-WRK001	Alternative	ReAct interleaves planning and execution; Plan-and-Execute separates them
Parallel Fan-Out/Fan-In	EAAPL-WRK003	Integrates With	Fan-out executes independent sub-tasks in the plan concurrently
Dynamic Sub-agent Spawning	EAAPL-WRK009	Related	Dynamic spawning creates agents at runtime; Plan-and-Execute defines sub-tasks upfront
Human Escalation	EAAPL-HITL001	Integrates With	Human approval gate for plan review before execution
Workflow State Machine	EAAPL-WRK012	Complementary	State machine can govern plan phases (planning, executing, revising, complete)

17. Maturity Assessment

Overall Maturity: Proven

Dimension	Score (1–5)	Evidence
Research Foundation	4	Plan-and-Execute (Wang et al., 2023); LLM-Compiler; widely cited in agent research
Production Deployment	3	Deployed in research tools, code generation, complex analysis; growing enterprise adoption
Framework Support	4	LangGraph Plan-and-Execute agent; AutoGen planner; custom implementations common
DAG Tooling	3	JSON DAG output from LLMs reliable; execution infrastructure maturing
Observability	3	Plan visualisation and execution tracing available in LangSmith; generally maturing

18. Revision History

Version	Date	Author	Changes
1.0	2025-06-13	Architecture Board	Initial publication in Agentic Workflows category

Track this pattern for APRA/ASIC review

← Back to Library More Agentic Workflows →