EAAPLEnterprise AI Architecture Pattern Library
EAAPLLibraryAgentic Workflows
Mature
⇄ Compare

Streaming Progressive Output

📄 Agentic WorkflowsEU AI ActISO/IEC 42001

[EAAPL-WRK010] Streaming Progressive Output

Category: Agentic Workflows Sub-category: Real-Time Delivery Architecture Version: 1.0 Maturity: Proven Tags: streaming, progressive-output, server-sent-events, partial-results, cancellation, real-time Regulatory Relevance: ISO 42001 §8.4, EU AI Act (Art. 13)


1. Executive Summary

The Streaming Progressive Output Pattern defines how agentic workflows deliver partial, incremental results to end-users or downstream systems as processing continues, rather than waiting for complete execution before returning any output. This pattern addresses the fundamental tension between agent workflow latency (which may be 30–120 seconds for complex tasks) and user experience expectations (which are calibrated to sub-second interactive response). By streaming intermediate events — reasoning steps, partial completions, tool call notifications, intermediate results — users see visible progress, can make informed decisions about whether to wait, and can cancel tasks that are heading in an unwanted direction.

For CIO/CTO audiences: streaming is the difference between a user watching a progress bar that actually moves (building trust, enabling early feedback) and staring at a spinner for 90 seconds with no indication of what is happening (building frustration, abandoning the session). For long-running agentic workflows, streaming is not an optional UX enhancement — it is a core capability that determines whether complex AI workflows are usable in production. It also enables a feedback loop that is otherwise impossible: the user can see reasoning steps mid-execution, identify when the agent is heading in the wrong direction, and cancel before wasting the full task cost.


2. Problem Statement

Business Problem

Complex agentic workflows take 30–180 seconds to complete. Without streaming, users receive no feedback during this time: they cannot see whether the agent is making progress, cannot identify if it has misunderstood the task, and cannot cancel if it is heading in an unwanted direction. Abandonment rates for non-streaming long-running tasks are significantly higher than for streaming equivalents.

Technical Problem

Standard request-response architecture requires the server to hold a connection open for the full task duration before returning any data. This creates: (a) connection timeout risks for long tasks, (b) no partial result availability for downstream pipelines that could begin processing early results, (c) no cancellation mechanism — the caller cannot abort an in-flight task.

Symptoms of Absence

  • Users abandon long-running agent tasks due to no visible progress feedback
  • No ability to cancel a task mid-execution when the agent misunderstands the request
  • Downstream pipelines cannot begin processing results until the entire task is complete
  • Long agent workflows frequently hit HTTP timeout limits in standard request-response proxies

Cost of Inaction

  • UX: High abandonment rates for long-running tasks directly reduce AI workflow adoption
  • Cost: Users who cannot cancel misunderstood tasks incur full task cost even when output is useless
  • Architecture: Timeout issues require workarounds (polling, webhooks) that add complexity without the UX benefit of true streaming

3. Context

When to Apply

  • Agent workflow latency consistently exceeds 5 seconds
  • End-users interact with the workflow in real time (not batch processing)
  • Intermediate results have value before completion (partial document drafts, preliminary findings)
  • Cancellation based on early output is a required capability
  • Downstream systems can begin processing partial results (pipeline streaming)

When NOT to Apply

  • Batch processing where latency is not a user-facing concern
  • Tasks where partial outputs are misleading and must not be shown before completion
  • Downstream systems cannot handle partial/incremental data
  • Security constraints prohibit streaming (e.g., outputs must be validated before transmission)

Prerequisites

  • Streaming-capable LLM API (server-sent events or streaming SDK)
  • Streaming-capable transport layer (SSE, WebSocket, or gRPC streaming)
  • Client-side streaming consumer implementation
  • Cancellation token or signal mechanism
  • Backpressure handling for slow consumers

Industry Applicability

Industry Streaming Use Case Key Streaming Event
Legal Real-time contract analysis Per-clause findings streamed as identified
Financial Services Live market analysis Data retrieval events + interim findings
Healthcare Clinical note generation Sentence-by-sentence generation streaming
Technology Code generation and review File-by-file review results; code tokens
Government Policy document generation Section-by-section draft streaming

4. Architecture Overview

Streaming progressive output is implemented at three layers: the LLM inference layer (token streaming), the workflow event layer (intermediate result events), and the transport layer (SSE/WebSocket).

Token Streaming Modern LLM APIs support token-level streaming: each generated token is delivered to the caller as it is produced, rather than waiting for the full completion. This is the foundation of real-time text generation feedback. Token streaming is straightforward to implement with modern SDK support but only provides the raw token stream — it does not provide structured intermediate results or workflow-level events.

Workflow Event Streaming For agentic workflows with tool calls, reasoning steps, and intermediate results, token-level streaming is insufficient. The user needs structured events: "Starting tool call: search_regulatory_db", "Tool result received: 3 relevant clauses found", "Intermediate finding: Clause 4.1 appears to create an obligation under CPS 234". The Workflow Event Emitter generates these structured events at each significant workflow step and writes them to an event stream.

Progressive Disclosure Strategy Not all intermediate content is appropriate to stream. The progressive disclosure policy defines: (a) what event types are streamed (reasoning steps, tool calls, partial results, progress indicators), (b) what event types are withheld until completion (sensitive intermediate data, unverified claims), (c) how events are formatted for the client (structured JSON events vs. human-readable text). The policy is configurable per task type and user role.

Transport Layer Streaming events are delivered to clients via: (a) Server-Sent Events (SSE) for unidirectional server-to-client streaming over HTTP (simplest, widely supported), (b) WebSocket for bidirectional streaming (enables mid-stream cancellation and user feedback), or (c) gRPC streaming for high-throughput, microservice-internal streaming.

Cancellation The client can send a cancellation signal at any time via the bidirectional transport (WebSocket or HTTP DELETE on the task resource). The Cancellation Handler in the agent workflow checks for cancellation signals between iterations and gracefully stops execution, returning the best partial result accumulated to that point with a status: cancelled metadata flag.

Backpressure If the client consumes events slower than the workflow produces them, the event buffer must be managed. Backpressure strategy: buffer up to N events; if buffer is full, pause workflow event production (let the LLM inference continue but hold events); resume when buffer clears. For fast workflows with slow clients, drop non-critical progress events but always deliver result events.


5. Architecture Diagram

ARCHITECTURE DIAGRAM
flowchart TD subgraph Workflow["Agent Workflow Execution"] A[Agent Reasoning] B[Tool Call Execution] C[Intermediate Result] D[Final Result] end subgraph Events["Workflow Event Emitter"] E[Token Stream Events] F[Tool Call Events] G[Intermediate Result Events] H[Completion Event] I[Error / Cancel Events] end subgraph Transport["Streaming Transport Layer"] J{Transport Protocol} K[SSE Stream] L[WebSocket] end subgraph Client["Client"] M[Stream Consumer] N[Cancel Signal] end subgraph Control["Cancellation Handler"] O{Cancel Signal Received?} P[Graceful Stop] end A --> E B --> F C --> G D --> H E & F & G & H & I --> J J --> K & L K --> M L --> M M --> N N --> L L --> O O -->|yes| P O -->|no| A

6. Components

Component Type Responsibility Technology Options Criticality
Workflow Event Emitter Logic Component Generates structured events at each workflow step Custom event emitter; LangChain callbacks; LangGraph streaming Critical
Token Streamer AI Integration Relays LLM token stream from inference API OpenAI streaming SDK; Anthropic streaming SDK; Bedrock streaming Critical
Event Buffer State Buffers events between emitter and transport; handles backpressure Redis stream; in-memory async queue; RxJS observable High
SSE Transport Transport Server-Sent Events endpoint for unidirectional streaming FastAPI SSE; Express SSE; AWS API Gateway SSE High
WebSocket Transport Transport Bidirectional streaming for cancellation-enabled workflows FastAPI WebSocket; Socket.io; AWS API Gateway WebSocket High
Cancellation Handler Control Monitors for cancellation signals; triggers graceful stop Cancellation token pattern; asyncio.CancelledError Critical
Progressive Disclosure Filter Security + Logic Filters which events are streamed based on policy Configurable event type filter; role-based event visibility High
Partial Result Assembler Logic Assembles best partial result on cancellation or error Custom accumulator per workflow type High

7. Data Flow

Step Actor Action Output
1 Client Opens SSE connection; submits task SSE connection established; task ID returned
2 Workflow Event Emitter Task starts; emits progress event event: progress\ndata: {"status": "started", "task_id": "T-8821"}
3 Token Streamer LLM generates first reasoning tokens event: token\ndata: {"text": "Analysing contract clause 4.1..."} (repeated per token)
4 Workflow Event Emitter Tool call initiated event: tool_call\ndata: {"tool": "regulatory_search", "query": "CPS 234 §3.4", "status": "started"}
5 Workflow Event Emitter Tool result received event: tool_result\ndata: {"tool": "regulatory_search", "result_count": 3, "status": "complete"}
6 Token Streamer LLM generates intermediate finding event: token\ndata: {"text": "Clause 4.1 creates a mandatory notification obligation..."}
7 Client Reads intermediate finding; decides task is on track No cancellation signal sent
8 Workflow Event Emitter Final result assembled event: result\ndata: {"status": "complete", "result": {...}, "iterations": 4, "tool_calls": 2}
9 Transport SSE stream closed Connection closed with 200 status

Error Flow

Error Detection Recovery
Client disconnects mid-stream Transport write error Detect disconnect; gracefully stop workflow; persist partial result
Cancellation signal received Cancellation token check Graceful stop; emit cancel event; return partial result
LLM stream interruption Stream error from LLM API Retry token streaming; emit retry event; if persistent, emit error event
Event buffer overflow (slow client) Buffer capacity check Drop non-critical progress events; preserve result events; emit backpressure warning

8. Security Considerations

Information Disclosure via Intermediate Events

  • Intermediate events may expose internal reasoning, tool parameters, or intermediate data that should not be visible to the end user
  • Mitigation: Progressive disclosure filter enforces role-based event visibility; never stream raw tool parameters containing secrets or PII

OWASP LLM Top 10

OWASP LLM Risk Streaming Applicability Mitigation
LLM06 Sensitive Information Intermediate reasoning steps may contain sensitive data Progressive disclosure filter; separate internal-only events from user-visible events
LLM09 Overreliance Users may act on incomplete intermediate results before task completes Clearly label streaming events as incomplete; only label final result as authoritative
LLM04 Model DoS Many concurrent streaming connections exhaust server resources Connection limit per user; task concurrency limit; connection timeout

9. Governance Considerations

Streaming Event Audit

  • For regulated workflows, streaming events constitute a real-time activity log and must be captured in the audit record even if not persisted in the client view
  • The full event sequence must be reconstructable from the audit log

Governance Artefacts

Artefact Owner Frequency Purpose
Progressive Disclosure Policy AI Governance Board Per task type Documents which event types are visible to which user roles
Streaming Event Archive Compliance Per task (regulated) Full event stream for audit reconstruction
Cancellation Usage Report AI Operations Weekly Tracks cancellation rate and reason; high cancellation rate may indicate misrouted requests

10. Operational Considerations

SLOs

SLO Target Window Alert
Time-to-first-event (TTFE) ≤ 500ms from request 1-hour rolling > 2s triggers P2
Streaming connection drop rate ≤ 0.5% 1-hour rolling > 2% triggers P2
Cancellation handling latency ≤ 2s from signal to stop 1-hour rolling > 5s triggers P3
Event buffer overflow rate ≤ 0.1% 1-hour rolling > 1% triggers P3; review backpressure strategy

11. Cost Considerations

Streaming Config Infrastructure Cost UX Benefit Notes
Token-only streaming Minimal overhead High Good for text generation tasks
Full event streaming (SSE) Low (standard HTTP) Very High Recommended default
WebSocket streaming Low–Medium Very High + cancellation Required for cancellation capability
Long-hold SSE (> 5 min) Medium (connection resources) High Use task polling fallback for very long tasks

12. Trade-Off Analysis

Option UX Complexity Cancellation Infrastructure Best For
A: SSE with workflow events (Recommended) Very High Medium No (but can use HTTP DELETE) Low Most production use cases
B: WebSocket with bidirectional events Very High High Yes (native) Medium Interactive workflows requiring cancellation
C: Polling with partial result endpoint Medium Low Yes (stop polling) Low Environments where SSE/WebSocket unavailable
D: No streaming (wait for completion) Low Very Low No Very Low Batch processing; not for interactive use

13. Failure Modes

Failure Mode Likelihood Impact Detection Recovery
SSE proxy timeout (load balancer closes long connections) High High — all long workflows fail Connection duration monitoring Configure proxy timeout ≥ max expected task duration; use chunked transfer encoding keepalives
Client shows incomplete result as complete Medium High — user acts on partial data Event type clearly distinguishes partial vs. final Never show streaming tokens as the final result; only show completion event as authoritative
Large event backlog on slow client Medium Medium — memory pressure Buffer size monitoring Implement backpressure; drop progress events; never drop result events
Race condition: cancel arrives after completion Low Low Check cancel after completion event sent Idempotent cancel handling; cancel after completion is a no-op

14. Regulatory Considerations

EU AI Act

  • Art. 13 (Transparency): For high-risk AI systems, the streaming event sequence provides real-time transparency of the system's reasoning and tool use. The event archive must be retained as part of the system's transparency documentation.

ISO 42001

  • §8.4: Progressive disclosure policy determines what information is made available to users at each stage of a decision-making workflow; this policy must be documented and aligned with information governance requirements.

Australian Context

  • For financial services AI providing advice or decision support via streaming, intermediate findings streamed before completion must not be mistakenly treated as completed advice; clear labelling of event finality is required for compliance with ASIC RG 263 guidance on digital advice.

15. Reference Implementations

AWS

Component Service
SSE Transport Amazon API Gateway HTTP API + Lambda streaming response
WebSocket Transport Amazon API Gateway WebSocket API
Event Buffer Amazon SQS FIFO or Amazon Kinesis Data Streams
LLM Streaming Amazon Bedrock InvokeModelWithResponseStream

Azure

Component Service
SSE Transport Azure API Management + Azure Functions with streaming response
WebSocket Transport Azure Web PubSub
LLM Streaming Azure OpenAI streaming (stream: true)

On-Premises

Component Technology
SSE Transport FastAPI StreamingResponse; Flask SSE
WebSocket Transport FastAPI WebSocket; Starlette WebSocket
LLM Streaming OpenAI SDK stream=True; Anthropic SDK streaming
Event Emitter LangChain StreamingCallbackHandler; LangGraph streaming

Pattern ID Relationship Type Notes
ReAct Agent Loop EAAPL-WRK001 Integrates With Each ReAct iteration emits workflow events for streaming
Sequential Chain EAAPL-WRK002 Integrates With Each step completion emits a progress event for streaming
Workflow Tracing and Replay EAAPL-WRK013 Integrates With Streaming event archive is a primary trace source
Human Escalation EAAPL-HITL001 Integrates With Mid-stream escalation events prompt human review while workflow continues

17. Maturity Assessment

Overall Maturity: Proven

Dimension Score (1–5) Evidence
Research Foundation 3 Streaming UX research solid; agentic workflow event streaming newer
Production Deployment 4 ChatGPT, Claude, GitHub Copilot all use token streaming at scale
Framework Support 4 LangChain streaming callbacks; LangGraph streaming; all major LLM SDKs support streaming
Workflow Event Standards 3 Token streaming standardised; structured workflow event schemas evolving
Cancellation Implementation 3 HTTP cancellation patterns established; clean agentic cancellation maturing

18. Revision History

Version Date Author Changes
1.0 2025-06-13 Architecture Board Initial publication in Agentic Workflows category
← Back to LibraryMore Agentic Workflows