EAAPLEnterprise AI Architecture Pattern Library
EAAPLLibraryAgentic AIEAAPL-AGT007
EAAPL-AGT007Proven
⇄ Compare

Long-Running Agent

🤖 Agentic AIEU AI ActNIST AI RMF

[EAAPL-AGT007] Long-Running Agent

Category: Agentic AI Sub-category: Async Execution Architecture Version: 1.2 Maturity: Proven Tags: long-running, async, task-queue, heartbeat, cost-budget, partial-results, deadline-management, human-checkin Regulatory Relevance: APRA CPS 230 (Operational Resilience), ISO 22301, NIST AI RMF (MANAGE 4.1), EU AI Act (Art. 9, 14)


1. Executive Summary

The Long-Running Agent Pattern defines the architecture for AI agents that execute tasks over hours or days — due diligence analysis, large codebase refactoring, enterprise-wide data reconciliation, or extended research synthesis. These tasks cannot fit within the synchronous request-response paradigm: calling systems cannot hold a connection open for hours, LLM context windows cannot hold 48 hours of tool results, and cost controls require active monitoring rather than post-hoc billing surprises.

For CIO/CTO audiences: this pattern transforms AI agents from interactive request-responders into asynchronous workforce members — entities you assign a task to on Monday morning and receive a deliverable from by Friday, with status updates throughout and the ability to pause or redirect them at any point. It defines how to decompose multi-day tasks into manageable segments, how to monitor and control running costs, how to ensure partial results are safely preserved if the task is interrupted, and how to maintain human oversight over extended autonomous operation. The resulting architecture is what separates a toy AI demo from a production AI workforce capability.


2. Problem Statement

Business Problem

High-value knowledge work tasks take hours or days. A due diligence review of 500 contracts, a codebase-wide security audit, or a multi-source research synthesis cannot complete in seconds. If AI agents are restricted to short tasks, the most valuable automation opportunities remain out of reach.

Technical Problem

Synchronous agent execution (HTTP request/response model) is unsuitable for long tasks: connection timeouts, LLM context window limits, token cost unpredictability, and inability to inject human checkpoints all fail at scale. Context window exhaustion on multi-hour tasks is a particularly severe problem: a 100K token context window fills after 60–100 tool calls with moderate result sizes.

Symptoms of Absence

  • Tasks taking longer than 30 minutes are decomposed manually by humans into shorter subtasks, negating automation benefits
  • Cost surprises: a long agent task consumes 10–50× the anticipated token budget with no warning
  • Partial work is lost when infrastructure restarts or LLM provider timeouts occur at hour 3 of a 5-hour task
  • No mechanism for human course-correction once a long task is launched

Cost of Inaction

  • High-value automation opportunities (due diligence, audit, research) remain manual
  • Ad hoc workarounds (manually splitting tasks) create brittle processes that fail when task sizes vary
  • Infrastructure teams field escalations about unexplained high AI inference costs from long tasks without budget controls

3. Context

When to Apply

  • Expected task duration is > 30 minutes
  • Task involves processing a large corpus (hundreds of documents, thousands of records)
  • Human review or approval at intermediate milestones is required
  • Cost predictability and budget control are required
  • Partial results have value (delivering results incrementally is better than delivering nothing if the task is interrupted)

When NOT to Apply

  • Tasks that complete in < 5 minutes (async overhead not justified)
  • Tasks that require a synchronous response in the same user session
  • Tasks with no natural decomposition into independently useful segments

Prerequisites

  • EAAPL-AGT005 (Checkpoint and Recovery) — mandatory for multi-hour tasks
  • Durable task queue with dead-letter handling
  • Async notification infrastructure (webhooks, event bus, push notifications)
  • Cost monitoring and kill switch capability
  • Human management API (pause, redirect, cancel)

Industry Applicability

Industry Long-Running Task Duration Human Check-in Frequency
Legal / M&A Due diligence (500+ documents) 4–24 hours At task creation, 50% progress, completion
Financial Services Regulatory report generation, reconciliation 2–12 hours At key milestones; anomaly-triggered
Technology Large codebase security audit, refactoring 4–48 hours At phase boundaries
Healthcare Multi-source patient cohort analysis 2–8 hours At each data source completion
Research Literature synthesis, competitive analysis 8–72 hours Daily check-in

4. Architecture Overview

The Long-Running Agent Pattern addresses four fundamental challenges of extended autonomous execution: context window management, task decomposition and progress tracking, cost budget enforcement, and human oversight at meaningful checkpoints.

Task Decomposition and Segment Orchestration A long task is decomposed by the Task Planner into an ordered sequence of segments — bounded sub-tasks each of which can complete within the single-agent pattern's standard execution model (typically < 30 minutes, < 50K tokens). The segment plan is stored durably at task creation and is the master execution schedule. Each segment produces a partial result that is stored in the Partial Result Store. If the task is interrupted, the segment plan acts as the recovery map: completed segments are skipped; the next incomplete segment is resumed.

The segment plan is not a rigid pre-specified plan. The Task Planner can be queried to revise the remaining segment plan based on discoveries made in early segments (adaptive planning). For example, if segment 3 discovers that 200 additional documents need to be reviewed, the plan is revised to add segments 3a–3n before segment 4.

Context Window Management Across Segments Each segment executes in a fresh context window. The context for segment N includes: the original task objective, a summary of results from segments 1 through N-1 (produced by the Context Summariser component), the current segment's specific sub-objective, and the relevant tools. The summary is a lossy compression of prior results — the Task Planner specifies what information must be preserved across segment boundaries in the task plan.

This approach solves context exhaustion by design: no single segment accumulates more context than the window can hold. The cost is that inter-segment reasoning is mediated through the summary, which may lose nuance. For tasks that require tight consistency across many segments (e.g., a legal review where clause 400 must reference clause 12), the Task Planner must preserve the critical cross-references in the carry-forward summary.

Heartbeat and Progress Monitoring The long-running agent emits a heartbeat event to the monitoring system at the completion of each segment and at configurable intervals within a segment. The heartbeat includes: current segment number, total segments estimated, cost consumed so far, cost projected to completion (based on average cost per segment × remaining segments), elapsed time, and an ETA for completion. The Heartbeat Monitor triggers alerts if heartbeat events are not received within the expected interval — indicating a stuck or crashed agent.

Human Check-in Points The task plan defines human check-in points — typically at task creation (human reviews and approves the decomposition plan), at significant milestones (e.g., 50% completion), and at completion. At check-in points, the long-running agent pauses execution (using the checkpoint mechanism from EAAPL-AGT005), delivers the partial results and a progress summary to the human via a notification, and waits for human acknowledgment or instruction. The human can: approve and resume, redirect (modify the remaining segment plan), or cancel. This implements EU AI Act Art. 14 human oversight for high-risk long-running tasks.

Cost Budget and Kill Switch Before execution begins, the calling system specifies a cost budget (maximum token spend for the task). The Cost Controller monitors cumulative spend at each segment boundary. If projected cost-to-completion exceeds the budget, the Cost Controller pauses the task and notifies the human with the current partial results and a cost projection. The human can approve budget extension or accept the partial results. A hard kill switch (emergency stop) is available to humans at any time, delivering immediately available partial results and a clean task termination.

Partial Result Delivery Each completed segment's output is written to the Partial Result Store immediately upon completion. A Partial Result Aggregator compiles the running partial results into a human-consumable intermediate deliverable. The calling system can request partial results at any time via the management API, regardless of whether the task is still running. This enables progressive value delivery — a due diligence review that identifies 20 critical issues in the first 30% of documents is actionable immediately, before the full 500-document review completes.


5. Architecture Diagram

ARCHITECTURE DIAGRAM
flowchart TD subgraph Input["Task Initiation"] A[Long Task Request] B[Task Planner] end subgraph Execution["Async Execution Engine"] C[Task Queue] D[Segment Worker] E[Cost Controller] end subgraph Storage["State and Results"] F[(Checkpoint Store)] G[(Partial Result Store)] end A --> B B -->|segment plan + human approval| C C --> D D -->|checkpoint each segment| F D -->|segment output| G D --> E E -->|over budget| B F -->|recover on failure| D G -->|final aggregation| A style A fill:#dbeafe,stroke:#3b82f6 style B fill:#f0fdf4,stroke:#22c55e style C fill:#f0fdf4,stroke:#22c55e style D fill:#f0fdf4,stroke:#22c55e style E fill:#f3e8ff,stroke:#a855f7 style F fill:#fef9c3,stroke:#eab308 style G fill:#fef9c3,stroke:#eab308

6. Components

Component Type Responsibility Technology Options Criticality
Task Planner Orchestration / AI Decomposes long task into segment plan; revises plan adaptively LLM-based planner; rule-based decomposition for structured tasks Critical
Task Plan Store Persistence Stores segment plan; tracks segment completion status DynamoDB, PostgreSQL, Azure Cosmos DB Critical
Task Queue Message Queue Durable queue for segment execution; dead-letter for failed segments SQS, Azure Service Bus, Google Pub/Sub, Kafka Critical
Segment Worker Compute Executes each segment as a standard agent loop (EAAPL-AGT001) Containerised agent runtime; ECS, AKS, Cloud Run Critical
Context Summariser AI Component Compresses prior segment results into carry-forward context LLM summarisation; structured summary template per task type High
Partial Result Store Persistence Stores completed segment outputs; supports partial result queries PostgreSQL, S3 + DynamoDB index, Cosmos DB High
Partial Result Aggregator Orchestration Compiles segment outputs into intermediate deliverable Custom; LLM-assisted for natural language outputs Medium
Heartbeat Emitter Monitoring Emits heartbeat events at configurable intervals Custom; part of segment worker High
Heartbeat Monitor Monitoring Detects missed heartbeats; triggers recovery CloudWatch Alarms, Azure Monitor, custom High
Cost Controller Governance Tracks cumulative cost; projects to completion; enforces budget ceiling Custom + LLM provider usage APIs Critical
Management API Operations Exposes pause, redirect, cancel, status, partial-result endpoints REST API; API Gateway + Lambda/Functions High
Human Check-in Queue Human Oversight Delivers milestone notifications to human approvers; collects decisions Email, Slack, Teams, custom approval portal High
Checkpoint Store Recovery Stores segment-level checkpoints (EAAPL-AGT005) Redis, DynamoDB, Cosmos DB Critical
Deadline Manager SLA Monitors task ETA vs. deadline; alerts if deadline at risk Custom scheduler + ETA calculation Medium

7. Data Flow

Task Initiation

Step Actor Action Output
1 Calling System Submits long task: instruction, corpus reference, cost_budget, deadline, checkin_points Task request
2 Task Planner Analyses task; decomposes into N segments; assigns cost estimate per segment; identifies checkin milestones Segment plan: [{segment_id, sub_objective, input_scope, estimated_cost, checkin: bool}]
3 Human Check-in Delivers plan to human for review; awaits approval Approved / Modified plan
4 Task Queue Enqueues segment 1 for execution Segment 1 in queue

Segment Execution

Step Actor Action Output
1 Segment Worker Dequeues segment N; loads carry-forward context from Context Summariser Assembled context
2 Agent Loop Executes standard agent loop for segment N scope Segment N result
3 Partial Result Store Writes segment N result Partial result record
4 Cost Controller Updates cumulative cost; projects remaining cost Cost status
5 Heartbeat Emitter Emits segment completion heartbeat Heartbeat event
6 Checkpoint Writes segment N checkpoint Recovery state
7 Context Summariser Produces carry-forward summary including segment N findings Updated cross-segment summary
8 Checkin Gate If checkin milestone: pause; notify human; await instruction Human instruction
9 Task Queue Enqueues segment N+1 (or revised plan if redirected) Next segment queued

Error Flow

Error Detection Recovery
Segment worker crashes mid-execution Missed heartbeat Heartbeat monitor triggers recovery; resume from last checkpoint within segment
Task queue message lost Dead-letter queue DLQ alarm; reprocess segment from last checkpoint
LLM provider outage Segment worker invocation failure Exponential backoff retry; failover to secondary LLM provider if configured; alert
Cost overrun projection Cost Controller Pause task; notify human; await budget decision
Deadline at risk Deadline Manager Alert human; option to increase parallelism or reduce scope

8. Security Considerations

Long-Running Identity Tokens

  • Agent authentication tokens for accessing external tools must not expire during a multi-hour task
  • Implement token refresh within the segment worker; use long-lived service account credentials, not short-lived user tokens
  • Dynamic secrets (auto-rotating) must have rotation intervals longer than the maximum task duration

Data Retention of In-Progress Tasks

  • Partial results contain sensitive intermediate data; they must be encrypted and access-controlled
  • Partial results for cancelled tasks must be cleaned up according to the data retention policy
  • Cross-segment context summaries may contain PII extracted from processed documents; apply the same classification and retention rules as the source data

OWASP LLM Top 10

OWASP LLM Risk Long-Running Applicability Mitigation
LLM08 Excessive Agency A long-running agent operating autonomously for hours may drift from its initial scope without human awareness Mandatory human check-in at milestones; segment plan visible to humans from task creation; Management API enables real-time course correction at any point
LLM04 DoS Runaway long tasks consume excessive compute and API quotas Hard cost ceiling; segment count limit; deadline enforcement
LLM01 Prompt Injection Documents processed by the agent may contain injected instructions Content sanitisation on all ingested documents before task planning and segment execution
LLM09 Overreliance Business stakeholders may trust long-running agent outputs without appropriate scrutiny Output metadata includes confidence and completeness indicators; human check-in at completion is mandatory for high-stakes tasks

9. Governance Considerations

Human Oversight for Long-Running Tasks

  • All long-running tasks must have a named human owner who is notified of check-in points and receives partial results
  • Tasks exceeding a configured duration (default: 4 hours) automatically escalate to the human owner's manager
  • No task may run longer than 72 hours without a human re-approval of the segment plan

Governance Artefacts

Artefact Owner Frequency Purpose
Task Execution Log Platform Engineering Per task Complete segment-by-segment execution record with costs, durations, and human decisions
Cost Budget Report FinOps Monthly Aggregate long-task spend vs. budget; overrun analysis
Missed Deadline Report Operations Monthly Tasks that exceeded deadline; root cause analysis
Human Check-in Audit AI Governance Quarterly Review of human check-in compliance; decision quality audit

10. Operational Considerations

SLOs

SLO Target Window Alert
Heartbeat interval compliance 100% heartbeats within 2× expected interval Per task Any missed heartbeat triggers P2
Task completion rate ≥ 95% of started tasks complete Monthly < 90% triggers investigation
Segment retry rate ≤ 5% of segments require retry 24-hour rolling > 10% indicates infrastructure instability
Human check-in response time ≤ 4 hours for milestone approvals Per check-in > 8 hours triggers escalation to task owner's manager

Capacity

  • Segment workers are stateless containers; horizontal scaling is bounded by LLM provider quota and tool API rate limits
  • Estimate: 1 worker per 5 concurrent segments for 30-minute segments; scale up to 1 worker per concurrent segment for 5-minute segments
  • Partial result storage grows with task count × average output size; provision for 30-day retention of all partial results

11. Cost Considerations

Cost Drivers

Cost Driver Example Control
Total token consumption 500-doc due diligence: ~5M tokens Budget ceiling; scope reduction option
Context summarisation overhead 5–10% of total tokens for summaries Efficient summarisation prompt; smaller model for summaries
Segment retry cost Redundant work on retry Checkpoint granularity; reliable infrastructure
Long-running compute Worker idle time between segments Event-driven scaling; scale-to-zero between segments

Indicative Cost Range (USD)

Task Type Scale Estimated Token Count Estimated LLM Cost
Contract review (50 documents) Medium ~1.5M tokens $15–60
Contract review (500 documents) Large ~12M tokens $120–480
Codebase security audit (100K LOC) Large ~8M tokens $80–320
Research synthesis (200 papers) Large ~6M tokens $60–240

12. Trade-Off Analysis

Task Decomposition Options

Option Description Pros Cons Best For
A: LLM-Planned Segmentation (Recommended) Task Planner uses LLM to decompose task into segments Adaptive; handles irregular corpora Planner itself consumes tokens; plan quality depends on model Complex, variable tasks
B: Rule-Based Segmentation Fixed rules decompose by document count, page count, or time estimate Predictable; no LLM planning overhead Inflexible; poor fit for varied task types Well-structured, homogeneous tasks
C: User-Defined Milestones Human specifies segment boundaries upfront Maximum human control Requires human upfront effort; may mis-estimate Regulated tasks where human defines scope
D: Workflow Engine Native Temporal or Durable Functions handle segmentation Built-in persistence and retry; mature tooling Less LLM-native; segment boundaries are code-defined Engineering-intensive regulated workloads

Architectural Tensions

Tension Left Pole Right Pole Balance
Segment granularity vs. Context continuity Many small segments — low risk per segment Few large segments — better cross-segment reasoning 20–30 minute segments balancing context continuity and recovery granularity
Cost certainty vs. Completeness Hard budget ceiling — task may not complete Best-effort — may overrun budget Budget ceiling with human escalation at 80% spend; partial results delivered at ceiling
Human oversight frequency vs. Task latency Check-in after every segment Single check-in at completion Risk-tiered: check-in at task creation, major milestones, and completion

13. Failure Modes

Failure Mode Likelihood Impact Detection Recovery
Agent drifts from task scope over many segments Medium High — wasted work; wrong outputs Human check-in reveals drift; output quality monitoring Re-anchor with task objective in carry-forward context; human redirect
Cross-segment context summary loses critical information Medium High — logical inconsistencies in final output Human review of final output; quality scoring Preserve critical references explicitly in summary template; test on sample tasks
Task never terminates (segment count grows adaptively) Low High — cost overrun Segment count limit alert; cost ceiling Hard limit on total segment count; cost ceiling enforcement
Partial results delivered to wrong principal Very Low Critical — data breach Access control on partial result store IAM on partial result endpoints; audit of all retrievals
Infrastructure change invalidates checkpoint schema Low Medium — recovery fails Checkpoint deserialisation failure Schema versioning; migration function

14. Regulatory Considerations

APRA CPS 230

  • Long-running agents supporting material business services require RTO/RPO; the checkpoint + segmentation architecture enables sub-segment RTO
  • Multi-hour tasks interacting with critical systems require operational risk assessment and business impact analysis

EU AI Act

  • Art. 14 (Human Oversight): mandatory human check-ins at task creation and significant milestones implement the "meaningful human oversight" requirement for high-risk long-running agents
  • For high-risk AI systems: the complete task execution log (all segments, costs, human decisions, partial results) is a required audit artefact

15. Reference Implementations

AWS

Component Service
Task Queue Amazon SQS (FIFO with DLQ)
Segment Worker AWS ECS Fargate (event-triggered)
Task Plan + Partial Results Amazon DynamoDB
Workflow AWS Step Functions (for structured decomposition)
Heartbeat Monitor CloudWatch Alarms
Human Check-in Amazon SNS + custom approval portal or AWS Step Functions human task

Azure

Component Service
Task Queue Azure Service Bus
Segment Worker Azure Container Apps
Task Plan + Partial Results Azure Cosmos DB
Workflow Azure Durable Functions
Human Check-in Azure Logic Apps + Adaptive Cards (Teams)

On-Premises

Component Technology
Task Queue Apache Kafka or RabbitMQ
Segment Worker Kubernetes Jobs
Task Plan + Partial Results PostgreSQL
Workflow Temporal OSS

Pattern ID Relationship Type Notes
Single Agent Pattern EAAPL-AGT001 Extended By Each segment is a single agent loop execution
Agent Checkpoint and Recovery EAAPL-AGT005 Depends On Checkpointing is mandatory for multi-hour tasks
Agent Cost Governance EAAPL-AGT010 Integrates With Budget ceiling and kill switch are cost governance capabilities
Human-in-the-Loop Agent EAAPL-MAG003 Extends Human check-in at milestones is a specialised application of HITL
Supervisor Agent EAAPL-MAG002 Related Supervisor can orchestrate long-running segments; alternative decomposition model

17. Maturity Assessment

Overall Maturity: Proven

Dimension Score (1–5) Evidence
Core Technology (Queuing + Checkpointing) 5 Durable queues and checkpointing are mature distributed systems patterns
Context Summarisation Quality 3 Cross-segment context compression is a known challenge; LLM summarisation quality varies
Human Check-in UX 3 Tooling for human review of multi-hour tasks improving; no standard UX pattern yet
Cost Estimation Accuracy 3 Per-segment cost estimates improve with task history; initial estimates are rough
Adaptive Re-planning 2 Adaptive segment plan revision is emerging; limited production evidence

18. Revision History

Version Date Author Changes
1.0 2024-06-01 Architecture Board Initial publication
1.1 2024-10-15 Platform Engineering Added adaptive re-planning; deadline manager; partial result aggregator
1.2 2025-03-01 Architecture Board Added EU AI Act Art. 14 mapping; human check-in escalation policy; cost estimation table
← Back to LibraryMore Agentic AI