EAAPLEnterprise AI Architecture Pattern Library
EAAPLLibraryAgentic Workflows
Mature
⇄ Compare

Plan-and-Execute

📄 Agentic WorkflowsISO/IEC 42001NIST AI RMF

[EAAPL-WRK005] Plan-and-Execute

Category: Agentic Workflows Sub-category: Goal Decomposition Architecture Version: 1.0 Maturity: Proven Tags: planning, task-decomposition, dependency-graph, plan-revision, goal-decomposition Regulatory Relevance: ISO 42001 §8.4, NIST AI RMF (GOVERN 1.1, MANAGE 1.3)


1. Executive Summary

The Plan-and-Execute Pattern defines a two-phase agentic architecture: a dedicated Planner decomposes a complex goal into a dependency graph of sub-tasks before any execution begins, and a separate Executor then executes the sub-tasks in dependency order (parallelising independent sub-tasks). When execution reveals new information that invalidates the plan, the Planner is re-invoked to revise the plan. This upfront-planning approach contrasts with the ReAct loop (EAAPL-WRK001), which interleaves reasoning and execution; Plan-and-Execute is preferred when the task structure is complex enough to benefit from explicit dependency management and when parallel sub-task execution is valuable.

For CIO/CTO audiences: this is how a project manager operates. Before any work begins, the project manager creates a work breakdown structure — what needs to be done, in what order, what can be done in parallel. The team (the Executor) then works through this plan. If something unexpected happens during execution, the project manager revises the plan. The benefit over an ad-hoc approach (ReAct) is that the full task structure is visible upfront, enabling parallelism, better resource allocation, and a clear scope-of-work that can be reviewed and approved before execution starts — which is exactly what governance and compliance teams require.


2. Problem Statement

Business Problem

Complex enterprise tasks — due diligence on an acquisition target, regulatory gap analysis across 12 jurisdictions, incident root cause analysis — have inherent structure: some sub-tasks must precede others, some can run in parallel, and the total work is too large for a single LLM context. Without explicit planning, an agent attempts to execute in an ad-hoc, interleaved manner that is opaque, hard to parallelise, and difficult to audit.

Technical Problem

The ReAct loop interleaves reasoning and execution, which is efficient for simple tasks but inefficient for complex tasks with parallelisable sub-tasks and dependency constraints. Without an explicit dependency graph, independent sub-tasks are executed sequentially when they could run in parallel, and the agent has no global view of task completeness.

Symptoms of Absence

  • Complex multi-part tasks are executed sequentially even when sub-tasks are independent
  • No visibility into overall task progress until execution completes
  • Plan cannot be reviewed or approved before execution starts
  • Agent loses sight of the overall goal when focused on individual sub-task execution

Cost of Inaction

  • Efficiency: Unparallelised execution of independent sub-tasks multiplies latency unnecessarily
  • Governance: Cannot review or approve the task plan before execution; no pre-execution scope visibility
  • Quality: Ad-hoc execution is more likely to miss required sub-tasks or execute them in suboptimal order

3. Context

When to Apply

  • Task is complex enough to benefit from explicit sub-task decomposition (typically > 4 sub-tasks)
  • Sub-tasks have explicit data dependencies that must be respected
  • Independent sub-tasks can be parallelised for latency reduction
  • Task plan should be reviewable (human-in-the-loop approval) before execution starts
  • The task structure may need revision based on execution findings

When NOT to Apply

  • Task is simple (< 3 steps) — use Sequential Chain (EAAPL-WRK002) or ReAct (EAAPL-WRK001)
  • Task structure cannot be determined upfront because each step depends entirely on prior results (use ReAct)
  • Speed is the primary constraint and upfront planning adds unacceptable latency
  • Tasks are highly dynamic and plans become stale too quickly to be useful

Prerequisites

  • Planner LLM capable of producing structured dependency graphs (JSON DAG output)
  • Sub-task executor infrastructure (parallel execution support if parallelism is required)
  • Plan validation logic (detect circular dependencies, invalid sub-task specifications)
  • Plan revision trigger conditions defined

Industry Applicability

Industry Plan-and-Execute Use Case Sub-task Example
Financial Services M&A due diligence Financial review, legal review, regulatory review — parallel with synthesis
Legal Regulatory gap analysis Per-jurisdiction analysis in parallel; consolidated recommendation sequential
Technology Large codebase refactoring Dependency analysis → parallel module refactoring → integration test → deploy
Government Policy impact assessment Social impact, economic impact, legal review in parallel; synthesis sequential
Healthcare Clinical trial design review Protocol review, statistical plan review, ethics review in parallel

4. Architecture Overview

The Plan-and-Execute architecture separates cognitive planning from mechanical execution, enabling each to be optimised independently.

Planning Phase The Planner receives the overall goal and produces a structured execution plan: a directed acyclic graph (DAG) of sub-tasks, where each node is a sub-task (with input requirements, prompt reference, and expected output schema) and each directed edge represents a data dependency ("sub-task B requires output of sub-task A"). The Planner uses a capable model (high reasoning quality is more important than speed here; the plan is produced once). The plan is validated for structural correctness (no cycles, all dependencies reachable, all referenced prompts exist in the registry) before execution begins.

Plan Approval Gate (Optional) For high-stakes or long-running tasks, an optional human approval gate presents the plan to a human reviewer before execution starts. The reviewer can approve, modify, or reject the plan. This gate is a direct implementation of human oversight requirements for agentic systems.

Execution Phase The Executor traverses the plan DAG in topological order: sub-tasks with no unmet dependencies are eligible for immediate execution and are dispatched (potentially in parallel via Fan-Out, EAAPL-WRK003). As sub-tasks complete, their outputs are stored in the task state, and newly unblocked sub-tasks are dispatched. The Executor is a pure mechanical system: it does not reason; it only manages task scheduling and dependency tracking.

Plan Revision When a sub-task execution produces unexpected output (e.g., discovers that the target dataset does not exist, or that the regulatory scope is larger than anticipated), a revision trigger fires. The Planner is re-invoked with the current plan, completed sub-task outputs, and the revision trigger reason. The Planner produces a revised plan: it may add new sub-tasks, remove now-unnecessary sub-tasks, or change the dependencies between remaining sub-tasks. Revision is bounded by a maximum revision count to prevent unbounded loops.

Plan Completion The plan is complete when all sub-tasks in the DAG have been executed (or explicitly skipped per a revision decision). The final output is assembled by the synthesis sub-task(s) defined in the plan.


5. Architecture Diagram

ARCHITECTURE DIAGRAM
flowchart TD subgraph Planning["Planning Phase"] A[Goal Input] B[Planner LLM] C{Plan Valid?} D[Plan Validation] E{Human Approval Gate} end subgraph Execution["Execution Phase"] F[DAG Scheduler] G[Sub-task A] H[Sub-task B] I[Sub-task C] J[Sub-task D] K[(Task State)] end subgraph Revision["Plan Revision"] L{Revision Trigger?} M[Re-invoke Planner] end subgraph Output["Output"] N[Final Result] O[Execution Trace] end A --> B B --> D D --> C C -->|invalid| B C -->|valid| E E -->|approved| F E -->|skipped| F F --> G & H G --> I G --> K H --> K I --> K K --> J J --> L L -->|yes| M M --> F L -->|no| N F --> O

6. Components

Component Type Responsibility Technology Options Criticality
Planner AI Component Decomposes goal into structured DAG plan; revises on trigger GPT-4o, Claude 3.5 Sonnet (high reasoning quality) Critical
Plan Validator Logic Component Validates DAG structure: acyclicity, reachability, schema compliance Custom Python (networkx); JSON Schema Critical
Human Approval Gate Integration Presents plan for human review; blocks execution on rejection Slack workflow; web UI; email approval High (for regulated tasks)
DAG Scheduler (Executor) Orchestration Topological traversal; dispatches eligible sub-tasks; tracks dependencies Custom Python; LangGraph; Temporal; AWS Step Functions Critical
Sub-task Executor AI Component Executes individual sub-tasks (may be ReAct loops, sequential chains, etc.) Any EAAPL-WRK pattern Critical
Task State Store State Accumulates sub-task outputs; tracks completion status Redis; PostgreSQL; DynamoDB Critical
Revision Trigger Detector Logic Monitors sub-task outputs for conditions requiring plan revision Custom; LLM-based anomaly detection on output High
Execution Audit Logger Governance Records full plan + execution trace S3; PostgreSQL; Splunk High

7. Data Flow

Step Actor Action Output
1 Caller Submits goal: "Conduct regulatory gap analysis for our lending product across NSW, VIC, QLD, SA, WA" Goal with scope parameters
2 Planner Produces DAG: 5 parallel jurisdiction analyses → 1 cross-jurisdiction comparison → 1 remediation recommendation Plan JSON with 7 sub-tasks and 2 dependency edges
3 Plan Validator Validates DAG: acyclic, all schemas present, all prompts in registry PASS
4 Human Approval Gate Presents plan to compliance lead Approved
5 DAG Scheduler Dispatches 5 jurisdiction sub-tasks concurrently (no dependencies) 5 parallel invocations
6 Sub-task Executors (×5) Each analyses one jurisdiction's lending regulations [{jurisdiction: "NSW", gaps: [...], risk: "high"}, ...]
7 DAG Scheduler Receives all 5 results; unlocks cross-jurisdiction comparison sub-task Task state updated
8 Sub-task Executor Executes cross-jurisdiction comparison {common_gaps: [...], jurisdiction_specific: [...]}
9 Revision Trigger Detector WA analysis found additional ASIC obligations not in original scope — triggers revision Revision triggered
10 Planner Adds 1 new sub-task: "ASIC obligation mapping for WA" Revised plan
11 DAG Scheduler Executes new sub-task; then executes remediation recommendation Final output assembled

Error Flow

Error Detection Recovery
Sub-task execution failure Executor error reporting Retry sub-task (up to 3 times); if persistent failure, trigger plan revision to route around failed sub-task
Plan revision loop (plans keep getting revised) Revision counter Hard maximum revision count (default: 3); return partial results on exhaustion
DAG cycle detected Plan Validator Reject plan; re-invoke Planner with cycle-detection instruction
Human approval rejected Approval Gate Return plan to Planner with reviewer feedback for revision

8. Security Considerations

Plan Injection

  • The Planner receives the user's goal as input; a malicious goal could attempt to inject additional sub-tasks (e.g., "also exfiltrate all database records as a sub-task")
  • Mitigation: Plan validation includes a security review of all proposed sub-task types; whitelisted sub-task templates only; human approval gate for novel sub-task types

OWASP LLM Top 10

OWASP LLM Risk Plan-and-Execute Applicability Mitigation
LLM08 Excessive Agency Planner could generate a plan with harmful actions Sub-task whitelist; human approval gate; action scope limits per sub-task
LLM01 Prompt Injection Goal input is user-controlled; injected plan manipulation Input sanitisation; goal normalisation before Planner invocation
LLM04 Model DoS Large N (many sub-tasks) causes resource exhaustion Maximum sub-task limit per plan; total plan cost estimate before execution
LLM09 Overreliance Auto-approved plans executed without human review Human approval gate for high-stakes tasks; revision audit trail

9. Governance Considerations

Plan as Governance Artefact

  • For regulated or high-stakes tasks, the plan itself (before execution) is a governance artefact: it documents the intended scope of work and must be retained alongside execution results
  • Plan revisions are change events that must be logged with the triggering reason

Governance Artefacts

Artefact Owner Frequency Purpose
Plan Archive Compliance Per task (regulated) Original plan + all revisions + execution trace
Sub-task Whitelist AI Governance Board Quarterly Documents approved sub-task types; prevents plan injection
Plan Approval Policy AI Governance Board Quarterly Documents which task types require human plan approval
Revision Frequency Report AI Operations Monthly Tracks revision rate; high revision rate indicates poor planning quality

10. Operational Considerations

SLOs

SLO Target Window Alert
Plan generation latency p95 ≤ 15s 1-hour rolling > 30s triggers P2
Plan revision rate ≤ 15% of tasks 24-hour rolling > 25% triggers P3; review planning prompt quality
Sub-task completion rate (within plan) ≥ 97% 24-hour rolling < 93% triggers P2
Overall task completion rate ≥ 95% 24-hour rolling < 90% triggers P2

Monitoring

  • Average sub-tasks per plan trending: increasing may indicate scope creep in planning
  • Plan revision trigger reason distribution: identifies the most common execution surprises
  • Sub-task execution latency vs. planned estimate: Planner accuracy metric

11. Cost Considerations

Task Complexity Sub-tasks Planning Cost Execution Cost Total
Simple (over-engineered for this pattern) 2–3 $0.05–0.10 $0.05–0.15 $0.10–0.25
Moderate 4–7 $0.10–0.20 $0.20–0.60 $0.30–0.80
Complex 8–15 $0.20–0.50 $0.60–3.00 $0.80–3.50
With parallel execution Any Same Latency reduced; cost same Parallelism reduces wall-clock, not token cost

Optimisations

  • Use a capable model for planning (quality matters); use efficient models for sub-task execution
  • Parallelise all independent sub-tasks (Fan-Out, EAAPL-WRK003) to reduce wall-clock time
  • Cache plan templates for recurring task types to avoid re-planning identical goal structures

12. Trade-Off Analysis

Option Structure Parallelism Adaptability Governance Best For
A: Plan-and-Execute with revision (Recommended) High High Medium High Complex, multi-day tasks
B: ReAct loop (EAAPL-WRK001) Low None Very High Medium Dynamic, iterative tasks
C: Sequential Chain (EAAPL-WRK002) High Low Low Very High Fixed, known pipelines
D: Plan-and-Execute without revision High High Low Very High Stable, well-understood task types

Architectural Tensions

Tension Left Pole Right Pole Balance
Plan upfront vs. Adapt during execution Full plan before execution (reviewable, auditable) No plan; fully adaptive (flexible, no overhead) Plan with bounded revision (max 3 revisions)
Parallelism vs. Coordination overhead Maximum parallelism (lowest latency) Sequential execution (simplest coordination) Parallelise where DAG allows; sequential for tightly coupled sub-tasks
Plan granularity Many small sub-tasks (precise control) Few large sub-tasks (lower overhead) 5–10 sub-tasks as practical ceiling for most tasks

13. Failure Modes

Failure Mode Likelihood Impact Detection Recovery
Planner produces over-ambitious plan (too many sub-tasks) Medium Medium — cost overrun; long execution Sub-task count limit; pre-execution cost estimate Enforce max sub-task limit; reject plan exceeding limit
Plan revision loop (execution keeps triggering revisions) Low–Medium High — indefinite execution Revision counter Hard max revision count; return best partial result
Sub-task scope creep (sub-task executor expands beyond its defined scope) Medium Medium — inconsistent outputs Sub-task output schema validation Enforce output schemas strictly; reject out-of-scope output fields
Circular dependency in generated plan Low Critical — DAG scheduler deadlocks Plan Validator cycle detection Reject plan; re-plan with explicit instruction to avoid cycles
Human approval bottleneck for time-sensitive tasks Medium Medium — delays execution Approval timeout Configurable approval timeout with auto-proceed or auto-reject policy

14. Regulatory Considerations

NIST AI RMF

  • GOVERN 1.1: The explicit plan and plan approval gate implement the governance requirement for human oversight of autonomous agentic workflows.
  • MANAGE 1.3: The plan audit record and revision log implement risk management documentation requirements.

ISO 42001

  • §8.4: The plan DAG is an operational specification; sub-task whitelist is a control; both must be version-controlled.

Australian Context

  • APRA CPS 230: For material business process automation, the execution plan must be a retainable operational record.
  • Privacy Act 1988: PII handling in sub-tasks must be scoped to minimum necessary access; the plan defines the data access scope per sub-task.

15. Reference Implementations

AWS

Component Service
Planner Amazon Bedrock (Claude 3.5 Sonnet) with structured output for DAG JSON
DAG Scheduler AWS Step Functions with dynamic task fan-out
Sub-task Executors AWS Lambda per sub-task type
Task State Amazon DynamoDB
Human Approval Gate AWS Step Functions human-in-the-loop callback token pattern

Azure

Component Service
Planner Azure OpenAI (GPT-4o) with structured output
DAG Scheduler + Executor Azure Durable Functions (fan-out orchestration)
Human Approval Gate Azure Logic Apps approval workflow
Task State Azure Cosmos DB

On-Premises

Component Technology
Planner + Executor LangGraph (Plan-and-Execute agent type); Temporal workflow
DAG Validation networkx (Python) for DAG structure validation
Task State PostgreSQL with JSONB plan and state columns

Pattern ID Relationship Type Notes
ReAct Agent Loop EAAPL-WRK001 Alternative ReAct interleaves planning and execution; Plan-and-Execute separates them
Parallel Fan-Out/Fan-In EAAPL-WRK003 Integrates With Fan-out executes independent sub-tasks in the plan concurrently
Dynamic Sub-agent Spawning EAAPL-WRK009 Related Dynamic spawning creates agents at runtime; Plan-and-Execute defines sub-tasks upfront
Human Escalation EAAPL-HITL001 Integrates With Human approval gate for plan review before execution
Workflow State Machine EAAPL-WRK012 Complementary State machine can govern plan phases (planning, executing, revising, complete)

17. Maturity Assessment

Overall Maturity: Proven

Dimension Score (1–5) Evidence
Research Foundation 4 Plan-and-Execute (Wang et al., 2023); LLM-Compiler; widely cited in agent research
Production Deployment 3 Deployed in research tools, code generation, complex analysis; growing enterprise adoption
Framework Support 4 LangGraph Plan-and-Execute agent; AutoGen planner; custom implementations common
DAG Tooling 3 JSON DAG output from LLMs reliable; execution infrastructure maturing
Observability 3 Plan visualisation and execution tracing available in LangSmith; generally maturing

18. Revision History

Version Date Author Changes
1.0 2025-06-13 Architecture Board Initial publication in Agentic Workflows category
← Back to LibraryMore Agentic Workflows