[EAAPL-WRK005] Plan-and-Execute
Category: Agentic Workflows
Sub-category: Goal Decomposition Architecture
Version: 1.0
Maturity: Proven
Tags: planning, task-decomposition, dependency-graph, plan-revision, goal-decomposition
Regulatory Relevance: ISO 42001 §8.4, NIST AI RMF (GOVERN 1.1, MANAGE 1.3)
1. Executive Summary
The Plan-and-Execute Pattern defines a two-phase agentic architecture: a dedicated Planner decomposes a complex goal into a dependency graph of sub-tasks before any execution begins, and a separate Executor then executes the sub-tasks in dependency order (parallelising independent sub-tasks). When execution reveals new information that invalidates the plan, the Planner is re-invoked to revise the plan. This upfront-planning approach contrasts with the ReAct loop (EAAPL-WRK001), which interleaves reasoning and execution; Plan-and-Execute is preferred when the task structure is complex enough to benefit from explicit dependency management and when parallel sub-task execution is valuable.
For CIO/CTO audiences: this is how a project manager operates. Before any work begins, the project manager creates a work breakdown structure — what needs to be done, in what order, what can be done in parallel. The team (the Executor) then works through this plan. If something unexpected happens during execution, the project manager revises the plan. The benefit over an ad-hoc approach (ReAct) is that the full task structure is visible upfront, enabling parallelism, better resource allocation, and a clear scope-of-work that can be reviewed and approved before execution starts — which is exactly what governance and compliance teams require.
2. Problem Statement
Business Problem
Complex enterprise tasks — due diligence on an acquisition target, regulatory gap analysis across 12 jurisdictions, incident root cause analysis — have inherent structure: some sub-tasks must precede others, some can run in parallel, and the total work is too large for a single LLM context. Without explicit planning, an agent attempts to execute in an ad-hoc, interleaved manner that is opaque, hard to parallelise, and difficult to audit.
Technical Problem
The ReAct loop interleaves reasoning and execution, which is efficient for simple tasks but inefficient for complex tasks with parallelisable sub-tasks and dependency constraints. Without an explicit dependency graph, independent sub-tasks are executed sequentially when they could run in parallel, and the agent has no global view of task completeness.
Symptoms of Absence
- Complex multi-part tasks are executed sequentially even when sub-tasks are independent
- No visibility into overall task progress until execution completes
- Plan cannot be reviewed or approved before execution starts
- Agent loses sight of the overall goal when focused on individual sub-task execution
Cost of Inaction
- Efficiency: Unparallelised execution of independent sub-tasks multiplies latency unnecessarily
- Governance: Cannot review or approve the task plan before execution; no pre-execution scope visibility
- Quality: Ad-hoc execution is more likely to miss required sub-tasks or execute them in suboptimal order
3. Context
When to Apply
- Task is complex enough to benefit from explicit sub-task decomposition (typically > 4 sub-tasks)
- Sub-tasks have explicit data dependencies that must be respected
- Independent sub-tasks can be parallelised for latency reduction
- Task plan should be reviewable (human-in-the-loop approval) before execution starts
- The task structure may need revision based on execution findings
When NOT to Apply
- Task is simple (< 3 steps) — use Sequential Chain (EAAPL-WRK002) or ReAct (EAAPL-WRK001)
- Task structure cannot be determined upfront because each step depends entirely on prior results (use ReAct)
- Speed is the primary constraint and upfront planning adds unacceptable latency
- Tasks are highly dynamic and plans become stale too quickly to be useful
Prerequisites
- Planner LLM capable of producing structured dependency graphs (JSON DAG output)
- Sub-task executor infrastructure (parallel execution support if parallelism is required)
- Plan validation logic (detect circular dependencies, invalid sub-task specifications)
- Plan revision trigger conditions defined
Industry Applicability
| Industry |
Plan-and-Execute Use Case |
Sub-task Example |
| Financial Services |
M&A due diligence |
Financial review, legal review, regulatory review — parallel with synthesis |
| Legal |
Regulatory gap analysis |
Per-jurisdiction analysis in parallel; consolidated recommendation sequential |
| Technology |
Large codebase refactoring |
Dependency analysis → parallel module refactoring → integration test → deploy |
| Government |
Policy impact assessment |
Social impact, economic impact, legal review in parallel; synthesis sequential |
| Healthcare |
Clinical trial design review |
Protocol review, statistical plan review, ethics review in parallel |
4. Architecture Overview
The Plan-and-Execute architecture separates cognitive planning from mechanical execution, enabling each to be optimised independently.
Planning Phase
The Planner receives the overall goal and produces a structured execution plan: a directed acyclic graph (DAG) of sub-tasks, where each node is a sub-task (with input requirements, prompt reference, and expected output schema) and each directed edge represents a data dependency ("sub-task B requires output of sub-task A"). The Planner uses a capable model (high reasoning quality is more important than speed here; the plan is produced once). The plan is validated for structural correctness (no cycles, all dependencies reachable, all referenced prompts exist in the registry) before execution begins.
Plan Approval Gate (Optional)
For high-stakes or long-running tasks, an optional human approval gate presents the plan to a human reviewer before execution starts. The reviewer can approve, modify, or reject the plan. This gate is a direct implementation of human oversight requirements for agentic systems.
Execution Phase
The Executor traverses the plan DAG in topological order: sub-tasks with no unmet dependencies are eligible for immediate execution and are dispatched (potentially in parallel via Fan-Out, EAAPL-WRK003). As sub-tasks complete, their outputs are stored in the task state, and newly unblocked sub-tasks are dispatched. The Executor is a pure mechanical system: it does not reason; it only manages task scheduling and dependency tracking.
Plan Revision
When a sub-task execution produces unexpected output (e.g., discovers that the target dataset does not exist, or that the regulatory scope is larger than anticipated), a revision trigger fires. The Planner is re-invoked with the current plan, completed sub-task outputs, and the revision trigger reason. The Planner produces a revised plan: it may add new sub-tasks, remove now-unnecessary sub-tasks, or change the dependencies between remaining sub-tasks. Revision is bounded by a maximum revision count to prevent unbounded loops.
Plan Completion
The plan is complete when all sub-tasks in the DAG have been executed (or explicitly skipped per a revision decision). The final output is assembled by the synthesis sub-task(s) defined in the plan.
5. Architecture Diagram
flowchart TD
subgraph Planning["Planning Phase"]
A[Goal Input]
B[Planner LLM]
C{Plan Valid?}
D[Plan Validation]
E{Human Approval Gate}
end
subgraph Execution["Execution Phase"]
F[DAG Scheduler]
G[Sub-task A]
H[Sub-task B]
I[Sub-task C]
J[Sub-task D]
K[(Task State)]
end
subgraph Revision["Plan Revision"]
L{Revision Trigger?}
M[Re-invoke Planner]
end
subgraph Output["Output"]
N[Final Result]
O[Execution Trace]
end
A --> B
B --> D
D --> C
C -->|invalid| B
C -->|valid| E
E -->|approved| F
E -->|skipped| F
F --> G & H
G --> I
G --> K
H --> K
I --> K
K --> J
J --> L
L -->|yes| M
M --> F
L -->|no| N
F --> O
6. Components
| Component |
Type |
Responsibility |
Technology Options |
Criticality |
| Planner |
AI Component |
Decomposes goal into structured DAG plan; revises on trigger |
GPT-4o, Claude 3.5 Sonnet (high reasoning quality) |
Critical |
| Plan Validator |
Logic Component |
Validates DAG structure: acyclicity, reachability, schema compliance |
Custom Python (networkx); JSON Schema |
Critical |
| Human Approval Gate |
Integration |
Presents plan for human review; blocks execution on rejection |
Slack workflow; web UI; email approval |
High (for regulated tasks) |
| DAG Scheduler (Executor) |
Orchestration |
Topological traversal; dispatches eligible sub-tasks; tracks dependencies |
Custom Python; LangGraph; Temporal; AWS Step Functions |
Critical |
| Sub-task Executor |
AI Component |
Executes individual sub-tasks (may be ReAct loops, sequential chains, etc.) |
Any EAAPL-WRK pattern |
Critical |
| Task State Store |
State |
Accumulates sub-task outputs; tracks completion status |
Redis; PostgreSQL; DynamoDB |
Critical |
| Revision Trigger Detector |
Logic |
Monitors sub-task outputs for conditions requiring plan revision |
Custom; LLM-based anomaly detection on output |
High |
| Execution Audit Logger |
Governance |
Records full plan + execution trace |
S3; PostgreSQL; Splunk |
High |
7. Data Flow
| Step |
Actor |
Action |
Output |
| 1 |
Caller |
Submits goal: "Conduct regulatory gap analysis for our lending product across NSW, VIC, QLD, SA, WA" |
Goal with scope parameters |
| 2 |
Planner |
Produces DAG: 5 parallel jurisdiction analyses → 1 cross-jurisdiction comparison → 1 remediation recommendation |
Plan JSON with 7 sub-tasks and 2 dependency edges |
| 3 |
Plan Validator |
Validates DAG: acyclic, all schemas present, all prompts in registry |
PASS |
| 4 |
Human Approval Gate |
Presents plan to compliance lead |
Approved |
| 5 |
DAG Scheduler |
Dispatches 5 jurisdiction sub-tasks concurrently (no dependencies) |
5 parallel invocations |
| 6 |
Sub-task Executors (×5) |
Each analyses one jurisdiction's lending regulations |
[{jurisdiction: "NSW", gaps: [...], risk: "high"}, ...] |
| 7 |
DAG Scheduler |
Receives all 5 results; unlocks cross-jurisdiction comparison sub-task |
Task state updated |
| 8 |
Sub-task Executor |
Executes cross-jurisdiction comparison |
{common_gaps: [...], jurisdiction_specific: [...]} |
| 9 |
Revision Trigger Detector |
WA analysis found additional ASIC obligations not in original scope — triggers revision |
Revision triggered |
| 10 |
Planner |
Adds 1 new sub-task: "ASIC obligation mapping for WA" |
Revised plan |
| 11 |
DAG Scheduler |
Executes new sub-task; then executes remediation recommendation |
Final output assembled |
Error Flow
| Error |
Detection |
Recovery |
| Sub-task execution failure |
Executor error reporting |
Retry sub-task (up to 3 times); if persistent failure, trigger plan revision to route around failed sub-task |
| Plan revision loop (plans keep getting revised) |
Revision counter |
Hard maximum revision count (default: 3); return partial results on exhaustion |
| DAG cycle detected |
Plan Validator |
Reject plan; re-invoke Planner with cycle-detection instruction |
| Human approval rejected |
Approval Gate |
Return plan to Planner with reviewer feedback for revision |
8. Security Considerations
Plan Injection
- The Planner receives the user's goal as input; a malicious goal could attempt to inject additional sub-tasks (e.g., "also exfiltrate all database records as a sub-task")
- Mitigation: Plan validation includes a security review of all proposed sub-task types; whitelisted sub-task templates only; human approval gate for novel sub-task types
OWASP LLM Top 10
| OWASP LLM Risk |
Plan-and-Execute Applicability |
Mitigation |
| LLM08 Excessive Agency |
Planner could generate a plan with harmful actions |
Sub-task whitelist; human approval gate; action scope limits per sub-task |
| LLM01 Prompt Injection |
Goal input is user-controlled; injected plan manipulation |
Input sanitisation; goal normalisation before Planner invocation |
| LLM04 Model DoS |
Large N (many sub-tasks) causes resource exhaustion |
Maximum sub-task limit per plan; total plan cost estimate before execution |
| LLM09 Overreliance |
Auto-approved plans executed without human review |
Human approval gate for high-stakes tasks; revision audit trail |
9. Governance Considerations
Plan as Governance Artefact
- For regulated or high-stakes tasks, the plan itself (before execution) is a governance artefact: it documents the intended scope of work and must be retained alongside execution results
- Plan revisions are change events that must be logged with the triggering reason
Governance Artefacts
| Artefact |
Owner |
Frequency |
Purpose |
| Plan Archive |
Compliance |
Per task (regulated) |
Original plan + all revisions + execution trace |
| Sub-task Whitelist |
AI Governance Board |
Quarterly |
Documents approved sub-task types; prevents plan injection |
| Plan Approval Policy |
AI Governance Board |
Quarterly |
Documents which task types require human plan approval |
| Revision Frequency Report |
AI Operations |
Monthly |
Tracks revision rate; high revision rate indicates poor planning quality |
10. Operational Considerations
SLOs
| SLO |
Target |
Window |
Alert |
| Plan generation latency p95 |
≤ 15s |
1-hour rolling |
> 30s triggers P2 |
| Plan revision rate |
≤ 15% of tasks |
24-hour rolling |
> 25% triggers P3; review planning prompt quality |
| Sub-task completion rate (within plan) |
≥ 97% |
24-hour rolling |
< 93% triggers P2 |
| Overall task completion rate |
≥ 95% |
24-hour rolling |
< 90% triggers P2 |
Monitoring
- Average sub-tasks per plan trending: increasing may indicate scope creep in planning
- Plan revision trigger reason distribution: identifies the most common execution surprises
- Sub-task execution latency vs. planned estimate: Planner accuracy metric
11. Cost Considerations
| Task Complexity |
Sub-tasks |
Planning Cost |
Execution Cost |
Total |
| Simple (over-engineered for this pattern) |
2–3 |
$0.05–0.10 |
$0.05–0.15 |
$0.10–0.25 |
| Moderate |
4–7 |
$0.10–0.20 |
$0.20–0.60 |
$0.30–0.80 |
| Complex |
8–15 |
$0.20–0.50 |
$0.60–3.00 |
$0.80–3.50 |
| With parallel execution |
Any |
Same |
Latency reduced; cost same |
Parallelism reduces wall-clock, not token cost |
Optimisations
- Use a capable model for planning (quality matters); use efficient models for sub-task execution
- Parallelise all independent sub-tasks (Fan-Out, EAAPL-WRK003) to reduce wall-clock time
- Cache plan templates for recurring task types to avoid re-planning identical goal structures
12. Trade-Off Analysis
| Option |
Structure |
Parallelism |
Adaptability |
Governance |
Best For |
| A: Plan-and-Execute with revision (Recommended) |
High |
High |
Medium |
High |
Complex, multi-day tasks |
| B: ReAct loop (EAAPL-WRK001) |
Low |
None |
Very High |
Medium |
Dynamic, iterative tasks |
| C: Sequential Chain (EAAPL-WRK002) |
High |
Low |
Low |
Very High |
Fixed, known pipelines |
| D: Plan-and-Execute without revision |
High |
High |
Low |
Very High |
Stable, well-understood task types |
Architectural Tensions
| Tension |
Left Pole |
Right Pole |
Balance |
| Plan upfront vs. Adapt during execution |
Full plan before execution (reviewable, auditable) |
No plan; fully adaptive (flexible, no overhead) |
Plan with bounded revision (max 3 revisions) |
| Parallelism vs. Coordination overhead |
Maximum parallelism (lowest latency) |
Sequential execution (simplest coordination) |
Parallelise where DAG allows; sequential for tightly coupled sub-tasks |
| Plan granularity |
Many small sub-tasks (precise control) |
Few large sub-tasks (lower overhead) |
5–10 sub-tasks as practical ceiling for most tasks |
13. Failure Modes
| Failure Mode |
Likelihood |
Impact |
Detection |
Recovery |
| Planner produces over-ambitious plan (too many sub-tasks) |
Medium |
Medium — cost overrun; long execution |
Sub-task count limit; pre-execution cost estimate |
Enforce max sub-task limit; reject plan exceeding limit |
| Plan revision loop (execution keeps triggering revisions) |
Low–Medium |
High — indefinite execution |
Revision counter |
Hard max revision count; return best partial result |
| Sub-task scope creep (sub-task executor expands beyond its defined scope) |
Medium |
Medium — inconsistent outputs |
Sub-task output schema validation |
Enforce output schemas strictly; reject out-of-scope output fields |
| Circular dependency in generated plan |
Low |
Critical — DAG scheduler deadlocks |
Plan Validator cycle detection |
Reject plan; re-plan with explicit instruction to avoid cycles |
| Human approval bottleneck for time-sensitive tasks |
Medium |
Medium — delays execution |
Approval timeout |
Configurable approval timeout with auto-proceed or auto-reject policy |
14. Regulatory Considerations
NIST AI RMF
- GOVERN 1.1: The explicit plan and plan approval gate implement the governance requirement for human oversight of autonomous agentic workflows.
- MANAGE 1.3: The plan audit record and revision log implement risk management documentation requirements.
ISO 42001
- §8.4: The plan DAG is an operational specification; sub-task whitelist is a control; both must be version-controlled.
Australian Context
- APRA CPS 230: For material business process automation, the execution plan must be a retainable operational record.
- Privacy Act 1988: PII handling in sub-tasks must be scoped to minimum necessary access; the plan defines the data access scope per sub-task.
15. Reference Implementations
AWS
| Component |
Service |
| Planner |
Amazon Bedrock (Claude 3.5 Sonnet) with structured output for DAG JSON |
| DAG Scheduler |
AWS Step Functions with dynamic task fan-out |
| Sub-task Executors |
AWS Lambda per sub-task type |
| Task State |
Amazon DynamoDB |
| Human Approval Gate |
AWS Step Functions human-in-the-loop callback token pattern |
Azure
| Component |
Service |
| Planner |
Azure OpenAI (GPT-4o) with structured output |
| DAG Scheduler + Executor |
Azure Durable Functions (fan-out orchestration) |
| Human Approval Gate |
Azure Logic Apps approval workflow |
| Task State |
Azure Cosmos DB |
On-Premises
| Component |
Technology |
| Planner + Executor |
LangGraph (Plan-and-Execute agent type); Temporal workflow |
| DAG Validation |
networkx (Python) for DAG structure validation |
| Task State |
PostgreSQL with JSONB plan and state columns |
| Pattern |
ID |
Relationship Type |
Notes |
| ReAct Agent Loop |
EAAPL-WRK001 |
Alternative |
ReAct interleaves planning and execution; Plan-and-Execute separates them |
| Parallel Fan-Out/Fan-In |
EAAPL-WRK003 |
Integrates With |
Fan-out executes independent sub-tasks in the plan concurrently |
| Dynamic Sub-agent Spawning |
EAAPL-WRK009 |
Related |
Dynamic spawning creates agents at runtime; Plan-and-Execute defines sub-tasks upfront |
| Human Escalation |
EAAPL-HITL001 |
Integrates With |
Human approval gate for plan review before execution |
| Workflow State Machine |
EAAPL-WRK012 |
Complementary |
State machine can govern plan phases (planning, executing, revising, complete) |
17. Maturity Assessment
Overall Maturity: Proven
| Dimension |
Score (1–5) |
Evidence |
| Research Foundation |
4 |
Plan-and-Execute (Wang et al., 2023); LLM-Compiler; widely cited in agent research |
| Production Deployment |
3 |
Deployed in research tools, code generation, complex analysis; growing enterprise adoption |
| Framework Support |
4 |
LangGraph Plan-and-Execute agent; AutoGen planner; custom implementations common |
| DAG Tooling |
3 |
JSON DAG output from LLMs reliable; execution infrastructure maturing |
| Observability |
3 |
Plan visualisation and execution tracing available in LangSmith; generally maturing |
18. Revision History
| Version |
Date |
Author |
Changes |
| 1.0 |
2025-06-13 |
Architecture Board |
Initial publication in Agentic Workflows category |