EAAPLEnterprise AI Architecture Pattern Library
EAAPLLibraryAgentic Workflows
Mature
⇄ Compare

Parallel Fan-Out / Fan-In

📄 Agentic WorkflowsISO/IEC 42001NIST AI RMF

[EAAPL-WRK003] Parallel Fan-Out / Fan-In

Category: Agentic Workflows Sub-category: Parallel Execution Architecture Version: 1.0 Maturity: Proven Tags: fan-out, fan-in, parallel-execution, map-reduce, aggregation, fork-join Regulatory Relevance: ISO 42001 §8.4, NIST AI RMF (MANAGE 2.2)


1. Executive Summary

The Parallel Fan-Out / Fan-In Pattern defines an architecture in which a single task is decomposed into N independent sub-tasks (fan-out), each executed concurrently by a separate agent or LLM worker, and whose results are aggregated by a fan-in aggregator (fork-join). This is the AI equivalent of map-reduce: distribute parallel work, collect and synthesise results. Compared to sequential execution, fan-out/fan-in reduces end-to-end latency proportional to the degree of parallelism and provides the raw material for ensemble quality improvement through aggregation.

For CIO/CTO audiences: if a task can be broken into independent chunks — analyse 10 contracts simultaneously, generate 5 candidate responses in parallel, search 8 data sources concurrently — this pattern executes all chunks at the same time and synthesises the results, rather than doing them one after another. The cost is the same as sequential execution (or higher, if different models are used), but the wall-clock time drops by the parallelism factor. For time-sensitive workflows — due diligence, incident response, regulatory scanning — this latency reduction is the primary value. The secondary value is resilience: a single worker failure does not fail the entire task.


2. Problem Statement

Business Problem

Enterprise tasks frequently involve analysing multiple independent sources simultaneously: reviewing all contracts in a portfolio, scanning multiple regulatory databases, generating multiple solution candidates for comparison. Sequential processing makes total latency proportional to the number of sources, which is unacceptable for time-sensitive business processes.

Technical Problem

Sequential LLM execution does not utilise available parallelism. When sub-tasks are mutually independent — each sub-task's execution does not depend on another's result — sequential execution wastes wall-clock time and increases total task latency linearly with the number of sub-tasks.

Symptoms of Absence

  • Portfolio reviews, multi-source scans, or candidate generation tasks take N× longer than necessary
  • Partial failures in multi-source tasks fail the entire operation rather than returning partial results
  • No mechanism to compare multiple independent outputs for quality improvement
  • Throughput limited by sequential LLM token generation rate

Cost of Inaction

  • Latency: 10-source sequential scan at 5s/source = 50s; parallel = 5s. For interactive or SLA-bound processes, this is the difference between usable and unusable.
  • Resilience: Sequential pipelines have single-thread failure modes; parallel workers isolate failures
  • Quality: Without parallel candidate generation, there is no basis for output quality improvement through selection or synthesis

3. Context

When to Apply

  • Task decomposes into N mutually independent sub-tasks (no data dependency between sub-tasks)
  • Total latency is a primary constraint and parallel execution infrastructure is available
  • Sub-tasks are homogeneous (same prompt template, different inputs) or well-defined heterogeneous workers
  • Aggregation strategy is clearly defined and deterministic

When NOT to Apply

  • Sub-tasks are data-dependent (output of task A is input to task B) — use Sequential Chain (EAAPL-WRK002)
  • Sub-tasks require coordinated state (use Multi-Agent Orchestration, EAAPL-MAG001)
  • Cost is a hard constraint and parallelism does not reduce cost vs. sequential
  • Aggregation result is non-deterministic and variance is unacceptable

Prerequisites

  • Task decomposition function that produces independent sub-tasks
  • Defined aggregation strategy (union, intersection, voting, synthesis, best-of-N selection)
  • Concurrency infrastructure (async executor, worker pool, parallel workflow engine)
  • Partial result handling policy (fail-all vs. return-available-on-partial-failure)

Industry Applicability

Industry Fan-Out Use Case Aggregation Strategy
Financial Services Parallel credit bureau checks (Equifax, Experian, illion) Synthesis: merge scores + discrepancies
Legal Simultaneous review of 20 contracts in a portfolio Union: collect all findings; de-duplicate
Cybersecurity Parallel threat intelligence source query Union with deduplication; priority weighting
Healthcare Parallel guideline database search across multiple bodies Synthesis: reconcile potentially conflicting guidance
Government Parallel policy impact assessment across agencies Voting + synthesis for consensus recommendation

4. Architecture Overview

The Fan-Out/Fan-In architecture has three stages: decomposition, parallel execution, and aggregation.

Decomposition Phase The task decomposer receives the original task and produces a set of independent sub-task specifications. Each sub-task contains: the sub-task input, the prompt template to use, the worker configuration, and a correlation ID linking it back to the parent task. The decomposer is deterministic: for the same input, it produces the same sub-task set. This enables replay and deterministic debugging.

Fan-Out Phase The fan-out dispatcher submits all sub-tasks to the worker pool concurrently. Workers are stateless and homogeneous: each executes the same pattern (render prompt → invoke LLM → validate output → return result). The dispatcher tracks outstanding sub-tasks by correlation ID. The maximum degree of parallelism is configurable per task type, balancing API rate limits, cost controls, and latency objectives.

Worker Execution Each worker is a complete mini-pipeline: it renders its prompt from the sub-task specification, invokes the LLM, validates the output against the sub-task schema, and returns the validated result (or a structured error). Workers are independent — a failure in one worker does not affect others. Workers emit per-execution metrics (latency, token usage, validation result) for observability.

Fan-In Phase The fan-in aggregator receives all worker results (including partial results if some workers failed). It applies the configured aggregation strategy: union (combine all results), intersection (keep only results present in ≥ K workers), voting (majority rule for categorical decisions), or synthesis (LLM-based synthesis of all results into a unified output). The aggregation strategy is the primary design decision and must be chosen based on the task's quality requirements.

Aggregation Strategies

  • Union: Concatenate all results; suitable for comprehensive information gathering (all risk flags from all contracts)
  • Intersection: Keep only results confirmed by ≥ K/N workers; suitable for high-confidence claims
  • Voting: For categorical decisions, take the majority label; suitable for classification tasks
  • Synthesis: LLM call that synthesises all worker outputs into a unified narrative; highest quality, adds latency and cost
  • Best-of-N: Score all outputs and return the highest-scoring; suitable for candidate generation (EAAPL-WRK008)

5. Architecture Diagram

ARCHITECTURE DIAGRAM
flowchart TD subgraph Input["Task Input"] A[Original Task] end subgraph Decompose["Decomposition"] B[Task Decomposer] end subgraph Workers["Parallel Workers Fan-Out"] W1[Worker 1] W2[Worker 2] W3[Worker 3] WN[Worker N] end subgraph FanIn["Fan-In Aggregation"] C[Result Collector] D{Aggregation Strategy} E[Union / Voting] F[LLM Synthesis] end subgraph Output["Output"] G[Aggregated Result] H[Partial Result] end A --> B B --> W1 & W2 & W3 & WN W1 & W2 & W3 & WN --> C C --> D D -->|deterministic| E D -->|synthesis needed| F E --> G F --> G C -->|partial failure| H

6. Components

Component Type Responsibility Technology Options Criticality
Task Decomposer Logic Component Splits original task into N independent sub-tasks Deterministic rules; LLM-based decomposition; hybrid Critical
Fan-Out Dispatcher Orchestration Submits N sub-tasks to worker pool concurrently asyncio.gather (Python); AWS Step Functions Map state; Durable Fan-Out Critical
Worker AI Component Executes single sub-task: prompt → LLM → validate → return Stateless function; Lambda; container; same or different models Critical
Rate Limiter Resilience Enforces concurrency limit to respect API rate limits Token bucket; semaphore; API gateway throttle High
Result Collector State Tracks outstanding sub-tasks; collects results on completion Asyncio futures; Step Functions; Durable entity Critical
Aggregation Engine Logic Component Applies configured aggregation strategy to collected results Custom Python; LangChain; dedicated LLM synthesis call Critical
Partial Failure Handler Resilience Decides whether to fail-all or return partial results on worker failure Configurable threshold: e.g., ≥ 80% workers must succeed High
Fan-Out Metrics Emitter Observability Per-worker and per-aggregation latency, token usage, success rate Prometheus; CloudWatch; Datadog Medium

7. Data Flow

Step Actor Action Output
1 Caller Submits task: "Review all 8 vendor contracts for liability exposure" Task with 8 contract documents
2 Task Decomposer Creates 8 sub-tasks, one per contract [{sub_task_id: "ST-1", contract: doc1, prompt: "liability_review_v2"}, ...]
3 Fan-Out Dispatcher Submits all 8 sub-tasks concurrently 8 concurrent worker invocations
4 Workers 1–8 Each executes liability review on its assigned contract [{contract_id, risk_flags: [...], severity_max: "high"}, ...]
5 Result Collector Receives results as workers complete (non-blocking) 8/8 results received; 0 failures
6 Aggregation Engine Applies union strategy: merges all risk flags {total_risk_flags: 23, high_severity: 5, contracts_reviewed: 8}
7 Caller Receives aggregated result with per-contract breakdown Final report

Error Flow

Error Detection Recovery
Worker timeout Per-worker timeout in dispatcher Mark worker as failed; continue collecting other results
Worker validation failure Schema validation error in worker Retry worker once; if fails again, mark as failed result
Partial failure (< threshold workers succeeded) Partial Failure Handler If ≥ minimum success threshold: return partial result with failure list; else: fail entire task
Rate limit exceeded (too many concurrent API calls) HTTP 429 from LLM provider Rate limiter queues excess workers; no data loss

8. Security Considerations

Parallel Execution Amplifies Injection Risk

  • Fan-out submits N simultaneous LLM calls with potentially attacker-controlled inputs
  • A single poisoned input document affects only one worker; aggregation stage must not blindly trust any single worker's output

OWASP LLM Top 10

OWASP LLM Risk Fan-Out/Fan-In Applicability Mitigation
LLM01 Prompt Injection Each worker processes potentially untrusted content Per-worker input sanitisation; content delimiters
LLM04 Model DoS N parallel calls can exhaust API rate limits Rate limiter with configurable max concurrency; cost ceiling
LLM08 Excessive Agency N parallel workers × write-capable tools = N× side-effect amplification Read-only tools in workers by default; write actions require explicit fan-out permission
LLM09 Overreliance Aggregated result presented with false consensus confidence Aggregation metadata includes worker agreement rate; low agreement flags for human review

9. Governance Considerations

Aggregation Strategy Governance

  • The aggregation strategy (especially voting thresholds and synthesis prompts) has material impact on output quality and must be owned by domain SMEs
  • Aggregation strategies for regulated decisions (credit, underwriting) require model risk review

Governance Artefacts

Artefact Owner Frequency Purpose
Task Decomposition Specification AI Platform On change Documents how tasks are decomposed; decomposition logic is version-controlled
Aggregation Strategy Register Domain SME + AI Platform Per use case Documents chosen aggregation strategy, threshold values, and justification
Worker Result Archive Compliance Per execution (regulated) Individual worker outputs preserved for audit alongside aggregated result
Partial Failure Threshold Policy AI Governance Board Quarterly Documents acceptable failure thresholds per task class

10. Operational Considerations

SLOs

SLO Target Window Alert
Fan-out completion rate (all workers succeed) ≥ 97% 24-hour rolling < 93% triggers P2; check worker reliability
p95 fan-out wall-clock latency ≤ max(single worker p95) × 1.5 1-hour rolling Significant excess triggers P2; investigate stragglers
Worker success rate per task type ≥ 98% 24-hour rolling < 95% triggers P3
Aggregation synthesis latency ≤ 10s (for LLM synthesis) 1-hour rolling > 20s triggers P3

Monitoring

  • Straggler worker detection: workers taking >3× median latency slow down the entire fan-in
  • Worker result variance: high variance in outputs may indicate ambiguous sub-task specification
  • Aggregation confidence distribution: track worker agreement rates across task types

11. Cost Considerations

Configuration Workers Approx. Cost per Fan-Out (GPT-4o) Latency Benefit
Small fan-out 3–5 $0.05–0.20 3–5× faster than sequential
Medium fan-out 6–10 $0.20–0.60 6–10× faster than sequential
Large fan-out 11–20 $0.60–2.00 Up to 15× faster (API rate limits constrain max concurrency)
With LLM synthesis Any + 1 +$0.05–0.20 Additional synthesis call overhead

Optimisations

  • Use smaller, faster models for individual workers; reserve larger model for synthesis aggregation only
  • Cache worker results (by content hash) to avoid reprocessing identical sub-task inputs
  • Tune max concurrency to stay within LLM provider rate limits without queuing overhead

12. Trade-Off Analysis

Option Latency Cost Quality Complexity Best For
A: Fan-out with deterministic aggregation (Recommended for structured tasks) Low Equal to sequential High Medium Portfolio review, multi-source scan
B: Fan-out with LLM synthesis Low + synthesis Higher Very High Medium–High Complex synthesis needed
C: Sequential processing High (N×) Equal High Low Small N; dependency between steps
D: Mixture-of-Agents (EAAPL-WRK008) Low Higher (different models) Very High High Quality improvement through diversity

Architectural Tensions

Tension Left Pole Right Pole Balance
Parallelism vs. Rate limits Maximum parallelism for minimum latency Low concurrency to respect API limits Configure max concurrency per provider; use token bucket
Fail-all vs. Return-partial Return nothing unless all workers succeed Return whatever is available Configurable threshold (e.g., 80%); task-class specific
Deterministic vs. Synthesis aggregation Pure union/voting (fast, deterministic) LLM synthesis (higher quality, non-deterministic) Use deterministic for regulated decisions; synthesis for executive reports

13. Failure Modes

Failure Mode Likelihood Impact Detection Recovery
Straggler workers (one slow worker blocks fan-in) Medium Medium — overall latency spike Per-worker timeout monitoring Per-worker timeout; return partial without straggler
Correlated worker failures (all workers fail same way) Low High — aggregation receives no valid results All-workers-failed detection Fallback to sequential processing or sequential retry
Aggregation bias (synthesis LLM over-weights first worker result) Medium Medium — result quality skewed Worker agreement rate monitoring Randomise worker result ordering before synthesis; use structured aggregation
Decomposition producing dependent sub-tasks Low High — workers produce incorrect results due to missing context Integration testing of decomposition logic Explicit data-independence check in decomposer; test with N=2 case
API cost explosion (N workers × unexpected long context) Low–Medium High — cost overrun Per-task cost ceiling; fan-out cost estimate before dispatch Pre-estimate total cost before dispatch; abort if > ceiling

14. Regulatory Considerations

ISO 42001

  • §8.4: Parallel execution introduces non-determinism in timing; the pattern must ensure that the final aggregated output is deterministically reproducible from the worker inputs (deterministic aggregation strategies) or explicitly flagged as synthesis-based.

NIST AI RMF

  • MANAGE 2.2: Risk of correlated worker failures is a documented failure mode that must be managed; the partial failure handling policy is the control.

Australian Context

  • For APRA-regulated use cases, individual worker outputs must be retained alongside the aggregated result so that the aggregation can be audited and replayed.
  • For consumer-facing decisions (credit, insurance), the aggregation must not produce outcomes that cannot be explained to the affected individual; voting-based aggregation provides the most explainable audit trail.

15. Reference Implementations

AWS

Component Service
Fan-Out Dispatcher AWS Step Functions Map state (distributed mode for > 40 concurrent)
Workers AWS Lambda functions (one per sub-task invocation)
Result Collection Step Functions state machine synchronises Map outputs
LLM Synthesis Amazon Bedrock InvokeModel (Claude 3.5 Sonnet)
Rate Limiting Concurrency limit on Lambda + Step Functions MaxConcurrency

Azure

Component Service
Fan-Out Dispatcher Azure Durable Functions Fan-out/Fan-in pattern
Workers Durable Activity Functions
Result Collection Durable Orchestration Function Task.WhenAll
LLM Synthesis Azure OpenAI Service

On-Premises

Component Technology
Fan-Out Dispatcher Python asyncio.gather with semaphore for concurrency control
Workers Async coroutines; Ray for large-scale parallelism
Aggregation Custom Python; LangChain parallel chain
LLM vLLM with async OpenAI-compatible API

Pattern ID Relationship Type Notes
Mixture of Agents EAAPL-WRK008 Specialisation MoA uses fan-out with diverse models for quality improvement; this pattern uses fan-out for throughput/coverage
Multi-Agent Orchestration EAAPL-MAG001 Peer Orchestration manages agent coordination; fan-out is a specific execution topology within an orchestrated system
Plan-and-Execute EAAPL-WRK005 Complementary Plan-and-Execute uses fan-out to execute parallelisable planned sub-tasks
Sequential Chain EAAPL-WRK002 Alternative Sequential for dependent steps; fan-out for independent steps
Workflow State Machine EAAPL-WRK012 Integrates With State machine governs fan-out state transitions and failure handling

17. Maturity Assessment

Overall Maturity: Proven

Dimension Score (1–5) Evidence
Research Foundation 4 Map-reduce heritage; ensemble learning literature; LLM parallelism well-documented
Production Deployment 4 Deployed in document processing, multi-source search, candidate generation
Framework Support 4 LangChain parallel chains; Step Functions Map; Durable Functions fan-out
Aggregation Tooling 3 Deterministic aggregation mature; LLM synthesis aggregation still evolving best practices
Observability 3 Per-worker observability available; straggler detection tooling maturing

18. Revision History

Version Date Author Changes
1.0 2025-06-13 Architecture Board Initial publication in Agentic Workflows category
← Back to LibraryMore Agentic Workflows