Proven

Parallel Fan-Out / Fan-In

Agentic WorkflowsISO/IEC 42001NIST AI RMF

[EAAPL-WRK003] Parallel Fan-Out / Fan-In

Category: Agentic Workflows Sub-category: Parallel Execution Architecture Version: 1.0 Maturity: Proven Tags: fan-out, fan-in, parallel-execution, map-reduce, aggregation, fork-join Regulatory Relevance: ISO 42001 §8.4, NIST AI RMF (MANAGE 2.2)

1. Executive Summary

The Parallel Fan-Out / Fan-In Pattern defines an architecture in which a single task is decomposed into N independent sub-tasks (fan-out), each executed concurrently by a separate agent or LLM worker, and whose results are aggregated by a fan-in aggregator (fork-join). This is the AI equivalent of map-reduce: distribute parallel work, collect and synthesise results. Compared to sequential execution, fan-out/fan-in reduces end-to-end latency proportional to the degree of parallelism and provides the raw material for ensemble quality improvement through aggregation.

For CIO/CTO audiences: if a task can be broken into independent chunks — analyse 10 contracts simultaneously, generate 5 candidate responses in parallel, search 8 data sources concurrently — this pattern executes all chunks at the same time and synthesises the results, rather than doing them one after another. The cost is the same as sequential execution (or higher, if different models are used), but the wall-clock time drops by the parallelism factor. For time-sensitive workflows — due diligence, incident response, regulatory scanning — this latency reduction is the primary value. The secondary value is resilience: a single worker failure does not fail the entire task.

2. Problem Statement

Business Problem

Enterprise tasks frequently involve analysing multiple independent sources simultaneously: reviewing all contracts in a portfolio, scanning multiple regulatory databases, generating multiple solution candidates for comparison. Sequential processing makes total latency proportional to the number of sources, which is unacceptable for time-sensitive business processes.

Technical Problem

Sequential LLM execution does not utilise available parallelism. When sub-tasks are mutually independent — each sub-task's execution does not depend on another's result — sequential execution wastes wall-clock time and increases total task latency linearly with the number of sub-tasks.

Symptoms of Absence

Portfolio reviews, multi-source scans, or candidate generation tasks take N× longer than necessary
Partial failures in multi-source tasks fail the entire operation rather than returning partial results
No mechanism to compare multiple independent outputs for quality improvement
Throughput limited by sequential LLM token generation rate

Cost of Inaction

Latency: 10-source sequential scan at 5s/source = 50s; parallel = 5s. For interactive or SLA-bound processes, this is the difference between usable and unusable.
Resilience: Sequential pipelines have single-thread failure modes; parallel workers isolate failures
Quality: Without parallel candidate generation, there is no basis for output quality improvement through selection or synthesis

3. Context

When to Apply

Task decomposes into N mutually independent sub-tasks (no data dependency between sub-tasks)
Total latency is a primary constraint and parallel execution infrastructure is available
Sub-tasks are homogeneous (same prompt template, different inputs) or well-defined heterogeneous workers
Aggregation strategy is clearly defined and deterministic

When NOT to Apply

Sub-tasks are data-dependent (output of task A is input to task B) — use Sequential Chain (EAAPL-WRK002)
Sub-tasks require coordinated state (use Multi-Agent Orchestration, EAAPL-MAG001)
Cost is a hard constraint and parallelism does not reduce cost vs. sequential
Aggregation result is non-deterministic and variance is unacceptable

Prerequisites

Task decomposition function that produces independent sub-tasks
Defined aggregation strategy (union, intersection, voting, synthesis, best-of-N selection)
Concurrency infrastructure (async executor, worker pool, parallel workflow engine)
Partial result handling policy (fail-all vs. return-available-on-partial-failure)

Industry Applicability

Industry	Fan-Out Use Case	Aggregation Strategy
Financial Services	Parallel credit bureau checks (Equifax, Experian, illion)	Synthesis: merge scores + discrepancies
Legal	Simultaneous review of 20 contracts in a portfolio	Union: collect all findings; de-duplicate
Cybersecurity	Parallel threat intelligence source query	Union with deduplication; priority weighting
Healthcare	Parallel guideline database search across multiple bodies	Synthesis: reconcile potentially conflicting guidance
Government	Parallel policy impact assessment across agencies	Voting + synthesis for consensus recommendation

4. Architecture Overview

The Fan-Out/Fan-In architecture has three stages: decomposition, parallel execution, and aggregation.

Decomposition Phase The task decomposer receives the original task and produces a set of independent sub-task specifications. Each sub-task contains: the sub-task input, the prompt template to use, the worker configuration, and a correlation ID linking it back to the parent task. The decomposer is deterministic: for the same input, it produces the same sub-task set. This enables replay and deterministic debugging.

Fan-Out Phase The fan-out dispatcher submits all sub-tasks to the worker pool concurrently. Workers are stateless and homogeneous: each executes the same pattern (render prompt → invoke LLM → validate output → return result). The dispatcher tracks outstanding sub-tasks by correlation ID. The maximum degree of parallelism is configurable per task type, balancing API rate limits, cost controls, and latency objectives.

Worker Execution Each worker is a complete mini-pipeline: it renders its prompt from the sub-task specification, invokes the LLM, validates the output against the sub-task schema, and returns the validated result (or a structured error). Workers are independent — a failure in one worker does not affect others. Workers emit per-execution metrics (latency, token usage, validation result) for observability.

Fan-In Phase The fan-in aggregator receives all worker results (including partial results if some workers failed). It applies the configured aggregation strategy: union (combine all results), intersection (keep only results present in ≥ K workers), voting (majority rule for categorical decisions), or synthesis (LLM-based synthesis of all results into a unified output). The aggregation strategy is the primary design decision and must be chosen based on the task's quality requirements.

Aggregation Strategies

Union: Concatenate all results; suitable for comprehensive information gathering (all risk flags from all contracts)
Intersection: Keep only results confirmed by ≥ K/N workers; suitable for high-confidence claims
Voting: For categorical decisions, take the majority label; suitable for classification tasks
Synthesis: LLM call that synthesises all worker outputs into a unified narrative; highest quality, adds latency and cost
Best-of-N: Score all outputs and return the highest-scoring; suitable for candidate generation (EAAPL-WRK008)

5. Architecture Diagram

ARCHITECTURE DIAGRAM

flowchart TD subgraph Input["Task Input"] A[Original Task] end subgraph Decompose["Decomposition"] B[Task Decomposer] end subgraph Workers["Parallel Workers Fan-Out"] W1[Worker 1] W2[Worker 2] W3[Worker 3] WN[Worker N] end subgraph FanIn["Fan-In Aggregation"] C[Result Collector] D{Aggregation Strategy} E[Union / Voting] F[LLM Synthesis] end subgraph Output["Output"] G[Aggregated Result] H[Partial Result] end A --> B B --> W1 & W2 & W3 & WN W1 & W2 & W3 & WN --> C C --> D D -->|deterministic| E D -->|synthesis needed| F E --> G F --> G C -->|partial failure| H

6. Components

Component	Type	Responsibility	Technology Options	Criticality
Task Decomposer	Logic Component	Splits original task into N independent sub-tasks	Deterministic rules; LLM-based decomposition; hybrid	Critical
Fan-Out Dispatcher	Orchestration	Submits N sub-tasks to worker pool concurrently	asyncio.gather (Python); AWS Step Functions Map state; Durable Fan-Out	Critical
Worker	AI Component	Executes single sub-task: prompt → LLM → validate → return	Stateless function; Lambda; container; same or different models	Critical
Rate Limiter	Resilience	Enforces concurrency limit to respect API rate limits	Token bucket; semaphore; API gateway throttle	High
Result Collector	State	Tracks outstanding sub-tasks; collects results on completion	Asyncio futures; Step Functions; Durable entity	Critical
Aggregation Engine	Logic Component	Applies configured aggregation strategy to collected results	Custom Python; LangChain; dedicated LLM synthesis call	Critical
Partial Failure Handler	Resilience	Decides whether to fail-all or return partial results on worker failure	Configurable threshold: e.g., ≥ 80% workers must succeed	High
Fan-Out Metrics Emitter	Observability	Per-worker and per-aggregation latency, token usage, success rate	Prometheus; CloudWatch; Datadog	Medium

7. Data Flow

Step	Actor	Action	Output
1	Caller	Submits task: "Review all 8 vendor contracts for liability exposure"	Task with 8 contract documents
2	Task Decomposer	Creates 8 sub-tasks, one per contract	`[{sub_task_id: "ST-1", contract: doc1, prompt: "liability_review_v2"}, ...]`
3	Fan-Out Dispatcher	Submits all 8 sub-tasks concurrently	8 concurrent worker invocations
4	Workers 1–8	Each executes liability review on its assigned contract	`[{contract_id, risk_flags: [...], severity_max: "high"}, ...]`
5	Result Collector	Receives results as workers complete (non-blocking)	8/8 results received; 0 failures
6	Aggregation Engine	Applies union strategy: merges all risk flags	`{total_risk_flags: 23, high_severity: 5, contracts_reviewed: 8}`
7	Caller	Receives aggregated result with per-contract breakdown	Final report

Error Flow

Error	Detection	Recovery
Worker timeout	Per-worker timeout in dispatcher	Mark worker as failed; continue collecting other results
Worker validation failure	Schema validation error in worker	Retry worker once; if fails again, mark as failed result
Partial failure (< threshold workers succeeded)	Partial Failure Handler	If ≥ minimum success threshold: return partial result with failure list; else: fail entire task
Rate limit exceeded (too many concurrent API calls)	HTTP 429 from LLM provider	Rate limiter queues excess workers; no data loss

8. Security Considerations

Parallel Execution Amplifies Injection Risk

Fan-out submits N simultaneous LLM calls with potentially attacker-controlled inputs
A single poisoned input document affects only one worker; aggregation stage must not blindly trust any single worker's output

OWASP LLM Top 10

OWASP LLM Risk	Fan-Out/Fan-In Applicability	Mitigation
LLM01 Prompt Injection	Each worker processes potentially untrusted content	Per-worker input sanitisation; content delimiters
LLM04 Model DoS	N parallel calls can exhaust API rate limits	Rate limiter with configurable max concurrency; cost ceiling
LLM08 Excessive Agency	N parallel workers × write-capable tools = N× side-effect amplification	Read-only tools in workers by default; write actions require explicit fan-out permission
LLM09 Overreliance	Aggregated result presented with false consensus confidence	Aggregation metadata includes worker agreement rate; low agreement flags for human review

9. Governance Considerations

Aggregation Strategy Governance

The aggregation strategy (especially voting thresholds and synthesis prompts) has material impact on output quality and must be owned by domain SMEs
Aggregation strategies for regulated decisions (credit, underwriting) require model risk review

Governance Artefacts

Artefact	Owner	Frequency	Purpose
Task Decomposition Specification	AI Platform	On change	Documents how tasks are decomposed; decomposition logic is version-controlled
Aggregation Strategy Register	Domain SME + AI Platform	Per use case	Documents chosen aggregation strategy, threshold values, and justification
Worker Result Archive	Compliance	Per execution (regulated)	Individual worker outputs preserved for audit alongside aggregated result
Partial Failure Threshold Policy	AI Governance Board	Quarterly	Documents acceptable failure thresholds per task class

10. Operational Considerations

SLOs

SLO	Target	Window	Alert
Fan-out completion rate (all workers succeed)	≥ 97%	24-hour rolling	< 93% triggers P2; check worker reliability
p95 fan-out wall-clock latency	≤ max(single worker p95) × 1.5	1-hour rolling	Significant excess triggers P2; investigate stragglers
Worker success rate per task type	≥ 98%	24-hour rolling	< 95% triggers P3
Aggregation synthesis latency	≤ 10s (for LLM synthesis)	1-hour rolling	> 20s triggers P3

Monitoring

Straggler worker detection: workers taking >3× median latency slow down the entire fan-in
Worker result variance: high variance in outputs may indicate ambiguous sub-task specification
Aggregation confidence distribution: track worker agreement rates across task types

11. Cost Considerations

Configuration	Workers	Approx. Cost per Fan-Out (GPT-4o)	Latency Benefit
Small fan-out	3–5	$0.05–0.20	3–5× faster than sequential
Medium fan-out	6–10	$0.20–0.60	6–10× faster than sequential
Large fan-out	11–20	$0.60–2.00	Up to 15× faster (API rate limits constrain max concurrency)
With LLM synthesis	Any + 1	+$0.05–0.20	Additional synthesis call overhead

Optimisations

Use smaller, faster models for individual workers; reserve larger model for synthesis aggregation only
Cache worker results (by content hash) to avoid reprocessing identical sub-task inputs
Tune max concurrency to stay within LLM provider rate limits without queuing overhead

12. Trade-Off Analysis

Option	Latency	Cost	Quality	Complexity	Best For
A: Fan-out with deterministic aggregation (Recommended for structured tasks)	Low	Equal to sequential	High	Medium	Portfolio review, multi-source scan
B: Fan-out with LLM synthesis	Low + synthesis	Higher	Very High	Medium–High	Complex synthesis needed
C: Sequential processing	High (N×)	Equal	High	Low	Small N; dependency between steps
D: Mixture-of-Agents (EAAPL-WRK008)	Low	Higher (different models)	Very High	High	Quality improvement through diversity

Architectural Tensions

Tension	Left Pole	Right Pole	Balance
Parallelism vs. Rate limits	Maximum parallelism for minimum latency	Low concurrency to respect API limits	Configure max concurrency per provider; use token bucket
Fail-all vs. Return-partial	Return nothing unless all workers succeed	Return whatever is available	Configurable threshold (e.g., 80%); task-class specific
Deterministic vs. Synthesis aggregation	Pure union/voting (fast, deterministic)	LLM synthesis (higher quality, non-deterministic)	Use deterministic for regulated decisions; synthesis for executive reports

13. Failure Modes

Failure Mode	Likelihood	Impact	Detection	Recovery
Straggler workers (one slow worker blocks fan-in)	Medium	Medium — overall latency spike	Per-worker timeout monitoring	Per-worker timeout; return partial without straggler
Correlated worker failures (all workers fail same way)	Low	High — aggregation receives no valid results	All-workers-failed detection	Fallback to sequential processing or sequential retry
Aggregation bias (synthesis LLM over-weights first worker result)	Medium	Medium — result quality skewed	Worker agreement rate monitoring	Randomise worker result ordering before synthesis; use structured aggregation
Decomposition producing dependent sub-tasks	Low	High — workers produce incorrect results due to missing context	Integration testing of decomposition logic	Explicit data-independence check in decomposer; test with N=2 case
API cost explosion (N workers × unexpected long context)	Low–Medium	High — cost overrun	Per-task cost ceiling; fan-out cost estimate before dispatch	Pre-estimate total cost before dispatch; abort if > ceiling

14. Regulatory Considerations

ISO 42001

§8.4: Parallel execution introduces non-determinism in timing; the pattern must ensure that the final aggregated output is deterministically reproducible from the worker inputs (deterministic aggregation strategies) or explicitly flagged as synthesis-based.

NIST AI RMF

MANAGE 2.2: Risk of correlated worker failures is a documented failure mode that must be managed; the partial failure handling policy is the control.

Australian Context

For APRA-regulated use cases, individual worker outputs must be retained alongside the aggregated result so that the aggregation can be audited and replayed.
For consumer-facing decisions (credit, insurance), the aggregation must not produce outcomes that cannot be explained to the affected individual; voting-based aggregation provides the most explainable audit trail.

15. Reference Implementations

AWS

Component	Service
Fan-Out Dispatcher	AWS Step Functions Map state (distributed mode for > 40 concurrent)
Workers	AWS Lambda functions (one per sub-task invocation)
Result Collection	Step Functions state machine synchronises Map outputs
LLM Synthesis	Amazon Bedrock InvokeModel (Claude 3.5 Sonnet)
Rate Limiting	Concurrency limit on Lambda + Step Functions MaxConcurrency

Azure

Component	Service
Fan-Out Dispatcher	Azure Durable Functions Fan-out/Fan-in pattern
Workers	Durable Activity Functions
Result Collection	Durable Orchestration Function Task.WhenAll
LLM Synthesis	Azure OpenAI Service

On-Premises

Component	Technology
Fan-Out Dispatcher	Python asyncio.gather with semaphore for concurrency control
Workers	Async coroutines; Ray for large-scale parallelism
Aggregation	Custom Python; LangChain parallel chain
LLM	vLLM with async OpenAI-compatible API

Pattern	ID	Relationship Type	Notes
Mixture of Agents	EAAPL-WRK008	Specialisation	MoA uses fan-out with diverse models for quality improvement; this pattern uses fan-out for throughput/coverage
Multi-Agent Orchestration	EAAPL-MAG001	Peer	Orchestration manages agent coordination; fan-out is a specific execution topology within an orchestrated system
Plan-and-Execute	EAAPL-WRK005	Complementary	Plan-and-Execute uses fan-out to execute parallelisable planned sub-tasks
Sequential Chain	EAAPL-WRK002	Alternative	Sequential for dependent steps; fan-out for independent steps
Workflow State Machine	EAAPL-WRK012	Integrates With	State machine governs fan-out state transitions and failure handling

17. Maturity Assessment

Overall Maturity: Proven

Dimension	Score (1–5)	Evidence
Research Foundation	4	Map-reduce heritage; ensemble learning literature; LLM parallelism well-documented
Production Deployment	4	Deployed in document processing, multi-source search, candidate generation
Framework Support	4	LangChain parallel chains; Step Functions Map; Durable Functions fan-out
Aggregation Tooling	3	Deterministic aggregation mature; LLM synthesis aggregation still evolving best practices
Observability	3	Per-worker observability available; straggler detection tooling maturing

18. Revision History

Version	Date	Author	Changes
1.0	2025-06-13	Architecture Board	Initial publication in Agentic Workflows category

Track this pattern for APRA/ASIC review

← Back to Library More Agentic Workflows →