Proven

Streaming Progressive Output

Agentic WorkflowsEU AI ActISO/IEC 42001

[EAAPL-WRK010] Streaming Progressive Output

Category: Agentic Workflows Sub-category: Real-Time Delivery Architecture Version: 1.0 Maturity: Proven Tags: streaming, progressive-output, server-sent-events, partial-results, cancellation, real-time Regulatory Relevance: ISO 42001 §8.4, EU AI Act (Art. 13)

1. Executive Summary

The Streaming Progressive Output Pattern defines how agentic workflows deliver partial, incremental results to end-users or downstream systems as processing continues, rather than waiting for complete execution before returning any output. This pattern addresses the fundamental tension between agent workflow latency (which may be 30–120 seconds for complex tasks) and user experience expectations (which are calibrated to sub-second interactive response). By streaming intermediate events — reasoning steps, partial completions, tool call notifications, intermediate results — users see visible progress, can make informed decisions about whether to wait, and can cancel tasks that are heading in an unwanted direction.

For CIO/CTO audiences: streaming is the difference between a user watching a progress bar that actually moves (building trust, enabling early feedback) and staring at a spinner for 90 seconds with no indication of what is happening (building frustration, abandoning the session). For long-running agentic workflows, streaming is not an optional UX enhancement — it is a core capability that determines whether complex AI workflows are usable in production. It also enables a feedback loop that is otherwise impossible: the user can see reasoning steps mid-execution, identify when the agent is heading in the wrong direction, and cancel before wasting the full task cost.

2. Problem Statement

Business Problem

Complex agentic workflows take 30–180 seconds to complete. Without streaming, users receive no feedback during this time: they cannot see whether the agent is making progress, cannot identify if it has misunderstood the task, and cannot cancel if it is heading in an unwanted direction. Abandonment rates for non-streaming long-running tasks are significantly higher than for streaming equivalents.

Technical Problem

Standard request-response architecture requires the server to hold a connection open for the full task duration before returning any data. This creates: (a) connection timeout risks for long tasks, (b) no partial result availability for downstream pipelines that could begin processing early results, (c) no cancellation mechanism — the caller cannot abort an in-flight task.

Symptoms of Absence

Users abandon long-running agent tasks due to no visible progress feedback
No ability to cancel a task mid-execution when the agent misunderstands the request
Downstream pipelines cannot begin processing results until the entire task is complete
Long agent workflows frequently hit HTTP timeout limits in standard request-response proxies

Cost of Inaction

UX: High abandonment rates for long-running tasks directly reduce AI workflow adoption
Cost: Users who cannot cancel misunderstood tasks incur full task cost even when output is useless
Architecture: Timeout issues require workarounds (polling, webhooks) that add complexity without the UX benefit of true streaming

3. Context

When to Apply

Agent workflow latency consistently exceeds 5 seconds
End-users interact with the workflow in real time (not batch processing)
Intermediate results have value before completion (partial document drafts, preliminary findings)
Cancellation based on early output is a required capability
Downstream systems can begin processing partial results (pipeline streaming)

When NOT to Apply

Batch processing where latency is not a user-facing concern
Tasks where partial outputs are misleading and must not be shown before completion
Downstream systems cannot handle partial/incremental data
Security constraints prohibit streaming (e.g., outputs must be validated before transmission)

Prerequisites

Streaming-capable LLM API (server-sent events or streaming SDK)
Streaming-capable transport layer (SSE, WebSocket, or gRPC streaming)
Client-side streaming consumer implementation
Cancellation token or signal mechanism
Backpressure handling for slow consumers

Industry Applicability

Industry	Streaming Use Case	Key Streaming Event
Legal	Real-time contract analysis	Per-clause findings streamed as identified
Financial Services	Live market analysis	Data retrieval events + interim findings
Healthcare	Clinical note generation	Sentence-by-sentence generation streaming
Technology	Code generation and review	File-by-file review results; code tokens
Government	Policy document generation	Section-by-section draft streaming

4. Architecture Overview

Streaming progressive output is implemented at three layers: the LLM inference layer (token streaming), the workflow event layer (intermediate result events), and the transport layer (SSE/WebSocket).

Token Streaming Modern LLM APIs support token-level streaming: each generated token is delivered to the caller as it is produced, rather than waiting for the full completion. This is the foundation of real-time text generation feedback. Token streaming is straightforward to implement with modern SDK support but only provides the raw token stream — it does not provide structured intermediate results or workflow-level events.

Workflow Event Streaming For agentic workflows with tool calls, reasoning steps, and intermediate results, token-level streaming is insufficient. The user needs structured events: "Starting tool call: search_regulatory_db", "Tool result received: 3 relevant clauses found", "Intermediate finding: Clause 4.1 appears to create an obligation under CPS 234". The Workflow Event Emitter generates these structured events at each significant workflow step and writes them to an event stream.

Progressive Disclosure Strategy Not all intermediate content is appropriate to stream. The progressive disclosure policy defines: (a) what event types are streamed (reasoning steps, tool calls, partial results, progress indicators), (b) what event types are withheld until completion (sensitive intermediate data, unverified claims), (c) how events are formatted for the client (structured JSON events vs. human-readable text). The policy is configurable per task type and user role.

Transport Layer Streaming events are delivered to clients via: (a) Server-Sent Events (SSE) for unidirectional server-to-client streaming over HTTP (simplest, widely supported), (b) WebSocket for bidirectional streaming (enables mid-stream cancellation and user feedback), or (c) gRPC streaming for high-throughput, microservice-internal streaming.

Cancellation The client can send a cancellation signal at any time via the bidirectional transport (WebSocket or HTTP DELETE on the task resource). The Cancellation Handler in the agent workflow checks for cancellation signals between iterations and gracefully stops execution, returning the best partial result accumulated to that point with a status: cancelled metadata flag.

Backpressure If the client consumes events slower than the workflow produces them, the event buffer must be managed. Backpressure strategy: buffer up to N events; if buffer is full, pause workflow event production (let the LLM inference continue but hold events); resume when buffer clears. For fast workflows with slow clients, drop non-critical progress events but always deliver result events.

5. Architecture Diagram

ARCHITECTURE DIAGRAM

flowchart TD subgraph Workflow["Agent Workflow Execution"] A[Agent Reasoning] B[Tool Call Execution] C[Intermediate Result] D[Final Result] end subgraph Events["Workflow Event Emitter"] E[Token Stream Events] F[Tool Call Events] G[Intermediate Result Events] H[Completion Event] I[Error / Cancel Events] end subgraph Transport["Streaming Transport Layer"] J{Transport Protocol} K[SSE Stream] L[WebSocket] end subgraph Client["Client"] M[Stream Consumer] N[Cancel Signal] end subgraph Control["Cancellation Handler"] O{Cancel Signal Received?} P[Graceful Stop] end A --> E B --> F C --> G D --> H E & F & G & H & I --> J J --> K & L K --> M L --> M M --> N N --> L L --> O O -->|yes| P O -->|no| A

6. Components

Component	Type	Responsibility	Technology Options	Criticality
Workflow Event Emitter	Logic Component	Generates structured events at each workflow step	Custom event emitter; LangChain callbacks; LangGraph streaming	Critical
Token Streamer	AI Integration	Relays LLM token stream from inference API	OpenAI streaming SDK; Anthropic streaming SDK; Bedrock streaming	Critical
Event Buffer	State	Buffers events between emitter and transport; handles backpressure	Redis stream; in-memory async queue; RxJS observable	High
SSE Transport	Transport	Server-Sent Events endpoint for unidirectional streaming	FastAPI SSE; Express SSE; AWS API Gateway SSE	High
WebSocket Transport	Transport	Bidirectional streaming for cancellation-enabled workflows	FastAPI WebSocket; Socket.io; AWS API Gateway WebSocket	High
Cancellation Handler	Control	Monitors for cancellation signals; triggers graceful stop	Cancellation token pattern; asyncio.CancelledError	Critical
Progressive Disclosure Filter	Security + Logic	Filters which events are streamed based on policy	Configurable event type filter; role-based event visibility	High
Partial Result Assembler	Logic	Assembles best partial result on cancellation or error	Custom accumulator per workflow type	High

7. Data Flow

Step	Actor	Action	Output
1	Client	Opens SSE connection; submits task	SSE connection established; task ID returned
2	Workflow Event Emitter	Task starts; emits progress event	`event: progress\ndata: {"status": "started", "task_id": "T-8821"}`
3	Token Streamer	LLM generates first reasoning tokens	`event: token\ndata: {"text": "Analysing contract clause 4.1..."}` (repeated per token)
4	Workflow Event Emitter	Tool call initiated	`event: tool_call\ndata: {"tool": "regulatory_search", "query": "CPS 234 §3.4", "status": "started"}`
5	Workflow Event Emitter	Tool result received	`event: tool_result\ndata: {"tool": "regulatory_search", "result_count": 3, "status": "complete"}`
6	Token Streamer	LLM generates intermediate finding	`event: token\ndata: {"text": "Clause 4.1 creates a mandatory notification obligation..."}`
7	Client	Reads intermediate finding; decides task is on track	No cancellation signal sent
8	Workflow Event Emitter	Final result assembled	`event: result\ndata: {"status": "complete", "result": {...}, "iterations": 4, "tool_calls": 2}`
9	Transport	SSE stream closed	Connection closed with 200 status

Error Flow

Error	Detection	Recovery
Client disconnects mid-stream	Transport write error	Detect disconnect; gracefully stop workflow; persist partial result
Cancellation signal received	Cancellation token check	Graceful stop; emit cancel event; return partial result
LLM stream interruption	Stream error from LLM API	Retry token streaming; emit retry event; if persistent, emit error event
Event buffer overflow (slow client)	Buffer capacity check	Drop non-critical progress events; preserve result events; emit backpressure warning

8. Security Considerations

Information Disclosure via Intermediate Events

Intermediate events may expose internal reasoning, tool parameters, or intermediate data that should not be visible to the end user
Mitigation: Progressive disclosure filter enforces role-based event visibility; never stream raw tool parameters containing secrets or PII

OWASP LLM Top 10

OWASP LLM Risk	Streaming Applicability	Mitigation
LLM06 Sensitive Information	Intermediate reasoning steps may contain sensitive data	Progressive disclosure filter; separate internal-only events from user-visible events
LLM09 Overreliance	Users may act on incomplete intermediate results before task completes	Clearly label streaming events as incomplete; only label final result as authoritative
LLM04 Model DoS	Many concurrent streaming connections exhaust server resources	Connection limit per user; task concurrency limit; connection timeout

9. Governance Considerations

Streaming Event Audit

For regulated workflows, streaming events constitute a real-time activity log and must be captured in the audit record even if not persisted in the client view
The full event sequence must be reconstructable from the audit log

Governance Artefacts

Artefact	Owner	Frequency	Purpose
Progressive Disclosure Policy	AI Governance Board	Per task type	Documents which event types are visible to which user roles
Streaming Event Archive	Compliance	Per task (regulated)	Full event stream for audit reconstruction
Cancellation Usage Report	AI Operations	Weekly	Tracks cancellation rate and reason; high cancellation rate may indicate misrouted requests

10. Operational Considerations

SLOs

SLO	Target	Window	Alert
Time-to-first-event (TTFE)	≤ 500ms from request	1-hour rolling	> 2s triggers P2
Streaming connection drop rate	≤ 0.5%	1-hour rolling	> 2% triggers P2
Cancellation handling latency	≤ 2s from signal to stop	1-hour rolling	> 5s triggers P3
Event buffer overflow rate	≤ 0.1%	1-hour rolling	> 1% triggers P3; review backpressure strategy

11. Cost Considerations

Streaming Config	Infrastructure Cost	UX Benefit	Notes
Token-only streaming	Minimal overhead	High	Good for text generation tasks
Full event streaming (SSE)	Low (standard HTTP)	Very High	Recommended default
WebSocket streaming	Low–Medium	Very High + cancellation	Required for cancellation capability
Long-hold SSE (> 5 min)	Medium (connection resources)	High	Use task polling fallback for very long tasks

12. Trade-Off Analysis

Option	UX	Complexity	Cancellation	Infrastructure	Best For
A: SSE with workflow events (Recommended)	Very High	Medium	No (but can use HTTP DELETE)	Low	Most production use cases
B: WebSocket with bidirectional events	Very High	High	Yes (native)	Medium	Interactive workflows requiring cancellation
C: Polling with partial result endpoint	Medium	Low	Yes (stop polling)	Low	Environments where SSE/WebSocket unavailable
D: No streaming (wait for completion)	Low	Very Low	No	Very Low	Batch processing; not for interactive use

13. Failure Modes

Failure Mode	Likelihood	Impact	Detection	Recovery
SSE proxy timeout (load balancer closes long connections)	High	High — all long workflows fail	Connection duration monitoring	Configure proxy timeout ≥ max expected task duration; use chunked transfer encoding keepalives
Client shows incomplete result as complete	Medium	High — user acts on partial data	Event type clearly distinguishes partial vs. final	Never show streaming tokens as the final result; only show completion event as authoritative
Large event backlog on slow client	Medium	Medium — memory pressure	Buffer size monitoring	Implement backpressure; drop progress events; never drop result events
Race condition: cancel arrives after completion	Low	Low	Check cancel after completion event sent	Idempotent cancel handling; cancel after completion is a no-op

14. Regulatory Considerations

EU AI Act

Art. 13 (Transparency): For high-risk AI systems, the streaming event sequence provides real-time transparency of the system's reasoning and tool use. The event archive must be retained as part of the system's transparency documentation.

ISO 42001

§8.4: Progressive disclosure policy determines what information is made available to users at each stage of a decision-making workflow; this policy must be documented and aligned with information governance requirements.

Australian Context

For financial services AI providing advice or decision support via streaming, intermediate findings streamed before completion must not be mistakenly treated as completed advice; clear labelling of event finality is required for compliance with ASIC RG 263 guidance on digital advice.

15. Reference Implementations

AWS

Component	Service
SSE Transport	Amazon API Gateway HTTP API + Lambda streaming response
WebSocket Transport	Amazon API Gateway WebSocket API
Event Buffer	Amazon SQS FIFO or Amazon Kinesis Data Streams
LLM Streaming	Amazon Bedrock InvokeModelWithResponseStream

Azure

Component	Service
SSE Transport	Azure API Management + Azure Functions with streaming response
WebSocket Transport	Azure Web PubSub
LLM Streaming	Azure OpenAI streaming (stream: true)

On-Premises

Component	Technology
SSE Transport	FastAPI StreamingResponse; Flask SSE
WebSocket Transport	FastAPI WebSocket; Starlette WebSocket
LLM Streaming	OpenAI SDK stream=True; Anthropic SDK streaming
Event Emitter	LangChain StreamingCallbackHandler; LangGraph streaming

Pattern	ID	Relationship Type	Notes
ReAct Agent Loop	EAAPL-WRK001	Integrates With	Each ReAct iteration emits workflow events for streaming
Sequential Chain	EAAPL-WRK002	Integrates With	Each step completion emits a progress event for streaming
Workflow Tracing and Replay	EAAPL-WRK013	Integrates With	Streaming event archive is a primary trace source
Human Escalation	EAAPL-HITL001	Integrates With	Mid-stream escalation events prompt human review while workflow continues

17. Maturity Assessment

Overall Maturity: Proven

Dimension	Score (1–5)	Evidence
Research Foundation	3	Streaming UX research solid; agentic workflow event streaming newer
Production Deployment	4	ChatGPT, Claude, GitHub Copilot all use token streaming at scale
Framework Support	4	LangChain streaming callbacks; LangGraph streaming; all major LLM SDKs support streaming
Workflow Event Standards	3	Token streaming standardised; structured workflow event schemas evolving
Cancellation Implementation	3	HTTP cancellation patterns established; clean agentic cancellation maturing

18. Revision History

Version	Date	Author	Changes
1.0	2025-06-13	Architecture Board	Initial publication in Agentic Workflows category

Track this pattern for APRA/ASIC review

← Back to Library More Agentic Workflows →