[EAAPL-WRK010] Streaming Progressive Output
Category: Agentic Workflows
Sub-category: Real-Time Delivery Architecture
Version: 1.0
Maturity: Proven
Tags: streaming, progressive-output, server-sent-events, partial-results, cancellation, real-time
Regulatory Relevance: ISO 42001 §8.4, EU AI Act (Art. 13)
1. Executive Summary
The Streaming Progressive Output Pattern defines how agentic workflows deliver partial, incremental results to end-users or downstream systems as processing continues, rather than waiting for complete execution before returning any output. This pattern addresses the fundamental tension between agent workflow latency (which may be 30–120 seconds for complex tasks) and user experience expectations (which are calibrated to sub-second interactive response). By streaming intermediate events — reasoning steps, partial completions, tool call notifications, intermediate results — users see visible progress, can make informed decisions about whether to wait, and can cancel tasks that are heading in an unwanted direction.
For CIO/CTO audiences: streaming is the difference between a user watching a progress bar that actually moves (building trust, enabling early feedback) and staring at a spinner for 90 seconds with no indication of what is happening (building frustration, abandoning the session). For long-running agentic workflows, streaming is not an optional UX enhancement — it is a core capability that determines whether complex AI workflows are usable in production. It also enables a feedback loop that is otherwise impossible: the user can see reasoning steps mid-execution, identify when the agent is heading in the wrong direction, and cancel before wasting the full task cost.
2. Problem Statement
Business Problem
Complex agentic workflows take 30–180 seconds to complete. Without streaming, users receive no feedback during this time: they cannot see whether the agent is making progress, cannot identify if it has misunderstood the task, and cannot cancel if it is heading in an unwanted direction. Abandonment rates for non-streaming long-running tasks are significantly higher than for streaming equivalents.
Technical Problem
Standard request-response architecture requires the server to hold a connection open for the full task duration before returning any data. This creates: (a) connection timeout risks for long tasks, (b) no partial result availability for downstream pipelines that could begin processing early results, (c) no cancellation mechanism — the caller cannot abort an in-flight task.
Symptoms of Absence
- Users abandon long-running agent tasks due to no visible progress feedback
- No ability to cancel a task mid-execution when the agent misunderstands the request
- Downstream pipelines cannot begin processing results until the entire task is complete
- Long agent workflows frequently hit HTTP timeout limits in standard request-response proxies
Cost of Inaction
- UX: High abandonment rates for long-running tasks directly reduce AI workflow adoption
- Cost: Users who cannot cancel misunderstood tasks incur full task cost even when output is useless
- Architecture: Timeout issues require workarounds (polling, webhooks) that add complexity without the UX benefit of true streaming
3. Context
When to Apply
- Agent workflow latency consistently exceeds 5 seconds
- End-users interact with the workflow in real time (not batch processing)
- Intermediate results have value before completion (partial document drafts, preliminary findings)
- Cancellation based on early output is a required capability
- Downstream systems can begin processing partial results (pipeline streaming)
When NOT to Apply
- Batch processing where latency is not a user-facing concern
- Tasks where partial outputs are misleading and must not be shown before completion
- Downstream systems cannot handle partial/incremental data
- Security constraints prohibit streaming (e.g., outputs must be validated before transmission)
Prerequisites
- Streaming-capable LLM API (server-sent events or streaming SDK)
- Streaming-capable transport layer (SSE, WebSocket, or gRPC streaming)
- Client-side streaming consumer implementation
- Cancellation token or signal mechanism
- Backpressure handling for slow consumers
Industry Applicability
| Industry |
Streaming Use Case |
Key Streaming Event |
| Legal |
Real-time contract analysis |
Per-clause findings streamed as identified |
| Financial Services |
Live market analysis |
Data retrieval events + interim findings |
| Healthcare |
Clinical note generation |
Sentence-by-sentence generation streaming |
| Technology |
Code generation and review |
File-by-file review results; code tokens |
| Government |
Policy document generation |
Section-by-section draft streaming |
4. Architecture Overview
Streaming progressive output is implemented at three layers: the LLM inference layer (token streaming), the workflow event layer (intermediate result events), and the transport layer (SSE/WebSocket).
Token Streaming
Modern LLM APIs support token-level streaming: each generated token is delivered to the caller as it is produced, rather than waiting for the full completion. This is the foundation of real-time text generation feedback. Token streaming is straightforward to implement with modern SDK support but only provides the raw token stream — it does not provide structured intermediate results or workflow-level events.
Workflow Event Streaming
For agentic workflows with tool calls, reasoning steps, and intermediate results, token-level streaming is insufficient. The user needs structured events: "Starting tool call: search_regulatory_db", "Tool result received: 3 relevant clauses found", "Intermediate finding: Clause 4.1 appears to create an obligation under CPS 234". The Workflow Event Emitter generates these structured events at each significant workflow step and writes them to an event stream.
Progressive Disclosure Strategy
Not all intermediate content is appropriate to stream. The progressive disclosure policy defines: (a) what event types are streamed (reasoning steps, tool calls, partial results, progress indicators), (b) what event types are withheld until completion (sensitive intermediate data, unverified claims), (c) how events are formatted for the client (structured JSON events vs. human-readable text). The policy is configurable per task type and user role.
Transport Layer
Streaming events are delivered to clients via: (a) Server-Sent Events (SSE) for unidirectional server-to-client streaming over HTTP (simplest, widely supported), (b) WebSocket for bidirectional streaming (enables mid-stream cancellation and user feedback), or (c) gRPC streaming for high-throughput, microservice-internal streaming.
Cancellation
The client can send a cancellation signal at any time via the bidirectional transport (WebSocket or HTTP DELETE on the task resource). The Cancellation Handler in the agent workflow checks for cancellation signals between iterations and gracefully stops execution, returning the best partial result accumulated to that point with a status: cancelled metadata flag.
Backpressure
If the client consumes events slower than the workflow produces them, the event buffer must be managed. Backpressure strategy: buffer up to N events; if buffer is full, pause workflow event production (let the LLM inference continue but hold events); resume when buffer clears. For fast workflows with slow clients, drop non-critical progress events but always deliver result events.
5. Architecture Diagram
flowchart TD
subgraph Workflow["Agent Workflow Execution"]
A[Agent Reasoning]
B[Tool Call Execution]
C[Intermediate Result]
D[Final Result]
end
subgraph Events["Workflow Event Emitter"]
E[Token Stream Events]
F[Tool Call Events]
G[Intermediate Result Events]
H[Completion Event]
I[Error / Cancel Events]
end
subgraph Transport["Streaming Transport Layer"]
J{Transport Protocol}
K[SSE Stream]
L[WebSocket]
end
subgraph Client["Client"]
M[Stream Consumer]
N[Cancel Signal]
end
subgraph Control["Cancellation Handler"]
O{Cancel Signal Received?}
P[Graceful Stop]
end
A --> E
B --> F
C --> G
D --> H
E & F & G & H & I --> J
J --> K & L
K --> M
L --> M
M --> N
N --> L
L --> O
O -->|yes| P
O -->|no| A
6. Components
| Component |
Type |
Responsibility |
Technology Options |
Criticality |
| Workflow Event Emitter |
Logic Component |
Generates structured events at each workflow step |
Custom event emitter; LangChain callbacks; LangGraph streaming |
Critical |
| Token Streamer |
AI Integration |
Relays LLM token stream from inference API |
OpenAI streaming SDK; Anthropic streaming SDK; Bedrock streaming |
Critical |
| Event Buffer |
State |
Buffers events between emitter and transport; handles backpressure |
Redis stream; in-memory async queue; RxJS observable |
High |
| SSE Transport |
Transport |
Server-Sent Events endpoint for unidirectional streaming |
FastAPI SSE; Express SSE; AWS API Gateway SSE |
High |
| WebSocket Transport |
Transport |
Bidirectional streaming for cancellation-enabled workflows |
FastAPI WebSocket; Socket.io; AWS API Gateway WebSocket |
High |
| Cancellation Handler |
Control |
Monitors for cancellation signals; triggers graceful stop |
Cancellation token pattern; asyncio.CancelledError |
Critical |
| Progressive Disclosure Filter |
Security + Logic |
Filters which events are streamed based on policy |
Configurable event type filter; role-based event visibility |
High |
| Partial Result Assembler |
Logic |
Assembles best partial result on cancellation or error |
Custom accumulator per workflow type |
High |
7. Data Flow
| Step |
Actor |
Action |
Output |
| 1 |
Client |
Opens SSE connection; submits task |
SSE connection established; task ID returned |
| 2 |
Workflow Event Emitter |
Task starts; emits progress event |
event: progress\ndata: {"status": "started", "task_id": "T-8821"} |
| 3 |
Token Streamer |
LLM generates first reasoning tokens |
event: token\ndata: {"text": "Analysing contract clause 4.1..."} (repeated per token) |
| 4 |
Workflow Event Emitter |
Tool call initiated |
event: tool_call\ndata: {"tool": "regulatory_search", "query": "CPS 234 §3.4", "status": "started"} |
| 5 |
Workflow Event Emitter |
Tool result received |
event: tool_result\ndata: {"tool": "regulatory_search", "result_count": 3, "status": "complete"} |
| 6 |
Token Streamer |
LLM generates intermediate finding |
event: token\ndata: {"text": "Clause 4.1 creates a mandatory notification obligation..."} |
| 7 |
Client |
Reads intermediate finding; decides task is on track |
No cancellation signal sent |
| 8 |
Workflow Event Emitter |
Final result assembled |
event: result\ndata: {"status": "complete", "result": {...}, "iterations": 4, "tool_calls": 2} |
| 9 |
Transport |
SSE stream closed |
Connection closed with 200 status |
Error Flow
| Error |
Detection |
Recovery |
| Client disconnects mid-stream |
Transport write error |
Detect disconnect; gracefully stop workflow; persist partial result |
| Cancellation signal received |
Cancellation token check |
Graceful stop; emit cancel event; return partial result |
| LLM stream interruption |
Stream error from LLM API |
Retry token streaming; emit retry event; if persistent, emit error event |
| Event buffer overflow (slow client) |
Buffer capacity check |
Drop non-critical progress events; preserve result events; emit backpressure warning |
8. Security Considerations
Information Disclosure via Intermediate Events
- Intermediate events may expose internal reasoning, tool parameters, or intermediate data that should not be visible to the end user
- Mitigation: Progressive disclosure filter enforces role-based event visibility; never stream raw tool parameters containing secrets or PII
OWASP LLM Top 10
| OWASP LLM Risk |
Streaming Applicability |
Mitigation |
| LLM06 Sensitive Information |
Intermediate reasoning steps may contain sensitive data |
Progressive disclosure filter; separate internal-only events from user-visible events |
| LLM09 Overreliance |
Users may act on incomplete intermediate results before task completes |
Clearly label streaming events as incomplete; only label final result as authoritative |
| LLM04 Model DoS |
Many concurrent streaming connections exhaust server resources |
Connection limit per user; task concurrency limit; connection timeout |
9. Governance Considerations
Streaming Event Audit
- For regulated workflows, streaming events constitute a real-time activity log and must be captured in the audit record even if not persisted in the client view
- The full event sequence must be reconstructable from the audit log
Governance Artefacts
| Artefact |
Owner |
Frequency |
Purpose |
| Progressive Disclosure Policy |
AI Governance Board |
Per task type |
Documents which event types are visible to which user roles |
| Streaming Event Archive |
Compliance |
Per task (regulated) |
Full event stream for audit reconstruction |
| Cancellation Usage Report |
AI Operations |
Weekly |
Tracks cancellation rate and reason; high cancellation rate may indicate misrouted requests |
10. Operational Considerations
SLOs
| SLO |
Target |
Window |
Alert |
| Time-to-first-event (TTFE) |
≤ 500ms from request |
1-hour rolling |
> 2s triggers P2 |
| Streaming connection drop rate |
≤ 0.5% |
1-hour rolling |
> 2% triggers P2 |
| Cancellation handling latency |
≤ 2s from signal to stop |
1-hour rolling |
> 5s triggers P3 |
| Event buffer overflow rate |
≤ 0.1% |
1-hour rolling |
> 1% triggers P3; review backpressure strategy |
11. Cost Considerations
| Streaming Config |
Infrastructure Cost |
UX Benefit |
Notes |
| Token-only streaming |
Minimal overhead |
High |
Good for text generation tasks |
| Full event streaming (SSE) |
Low (standard HTTP) |
Very High |
Recommended default |
| WebSocket streaming |
Low–Medium |
Very High + cancellation |
Required for cancellation capability |
| Long-hold SSE (> 5 min) |
Medium (connection resources) |
High |
Use task polling fallback for very long tasks |
12. Trade-Off Analysis
| Option |
UX |
Complexity |
Cancellation |
Infrastructure |
Best For |
| A: SSE with workflow events (Recommended) |
Very High |
Medium |
No (but can use HTTP DELETE) |
Low |
Most production use cases |
| B: WebSocket with bidirectional events |
Very High |
High |
Yes (native) |
Medium |
Interactive workflows requiring cancellation |
| C: Polling with partial result endpoint |
Medium |
Low |
Yes (stop polling) |
Low |
Environments where SSE/WebSocket unavailable |
| D: No streaming (wait for completion) |
Low |
Very Low |
No |
Very Low |
Batch processing; not for interactive use |
13. Failure Modes
| Failure Mode |
Likelihood |
Impact |
Detection |
Recovery |
| SSE proxy timeout (load balancer closes long connections) |
High |
High — all long workflows fail |
Connection duration monitoring |
Configure proxy timeout ≥ max expected task duration; use chunked transfer encoding keepalives |
| Client shows incomplete result as complete |
Medium |
High — user acts on partial data |
Event type clearly distinguishes partial vs. final |
Never show streaming tokens as the final result; only show completion event as authoritative |
| Large event backlog on slow client |
Medium |
Medium — memory pressure |
Buffer size monitoring |
Implement backpressure; drop progress events; never drop result events |
| Race condition: cancel arrives after completion |
Low |
Low |
Check cancel after completion event sent |
Idempotent cancel handling; cancel after completion is a no-op |
14. Regulatory Considerations
EU AI Act
- Art. 13 (Transparency): For high-risk AI systems, the streaming event sequence provides real-time transparency of the system's reasoning and tool use. The event archive must be retained as part of the system's transparency documentation.
ISO 42001
- §8.4: Progressive disclosure policy determines what information is made available to users at each stage of a decision-making workflow; this policy must be documented and aligned with information governance requirements.
Australian Context
- For financial services AI providing advice or decision support via streaming, intermediate findings streamed before completion must not be mistakenly treated as completed advice; clear labelling of event finality is required for compliance with ASIC RG 263 guidance on digital advice.
15. Reference Implementations
AWS
| Component |
Service |
| SSE Transport |
Amazon API Gateway HTTP API + Lambda streaming response |
| WebSocket Transport |
Amazon API Gateway WebSocket API |
| Event Buffer |
Amazon SQS FIFO or Amazon Kinesis Data Streams |
| LLM Streaming |
Amazon Bedrock InvokeModelWithResponseStream |
Azure
| Component |
Service |
| SSE Transport |
Azure API Management + Azure Functions with streaming response |
| WebSocket Transport |
Azure Web PubSub |
| LLM Streaming |
Azure OpenAI streaming (stream: true) |
On-Premises
| Component |
Technology |
| SSE Transport |
FastAPI StreamingResponse; Flask SSE |
| WebSocket Transport |
FastAPI WebSocket; Starlette WebSocket |
| LLM Streaming |
OpenAI SDK stream=True; Anthropic SDK streaming |
| Event Emitter |
LangChain StreamingCallbackHandler; LangGraph streaming |
| Pattern |
ID |
Relationship Type |
Notes |
| ReAct Agent Loop |
EAAPL-WRK001 |
Integrates With |
Each ReAct iteration emits workflow events for streaming |
| Sequential Chain |
EAAPL-WRK002 |
Integrates With |
Each step completion emits a progress event for streaming |
| Workflow Tracing and Replay |
EAAPL-WRK013 |
Integrates With |
Streaming event archive is a primary trace source |
| Human Escalation |
EAAPL-HITL001 |
Integrates With |
Mid-stream escalation events prompt human review while workflow continues |
17. Maturity Assessment
Overall Maturity: Proven
| Dimension |
Score (1–5) |
Evidence |
| Research Foundation |
3 |
Streaming UX research solid; agentic workflow event streaming newer |
| Production Deployment |
4 |
ChatGPT, Claude, GitHub Copilot all use token streaming at scale |
| Framework Support |
4 |
LangChain streaming callbacks; LangGraph streaming; all major LLM SDKs support streaming |
| Workflow Event Standards |
3 |
Token streaming standardised; structured workflow event schemas evolving |
| Cancellation Implementation |
3 |
HTTP cancellation patterns established; clean agentic cancellation maturing |
18. Revision History
| Version |
Date |
Author |
Changes |
| 1.0 |
2025-06-13 |
Architecture Board |
Initial publication in Agentic Workflows category |