[EAAPL-WRK002] Sequential LLM Chain
Category: Agentic Workflows
Sub-category: Deterministic Pipeline Architecture
Version: 1.0
Maturity: Industry Standard
Tags: prompt-chaining, sequential-pipeline, state-passing, deterministic-workflow, schema-validation
Regulatory Relevance: ISO 42001 §8.4, APRA CPS 230 (auditability)
1. Executive Summary
The Sequential LLM Chain Pattern defines a deterministic multi-step pipeline in which each step's output is validated against a defined schema and passed as structured input to the next step. Unlike the iterative ReAct loop (EAAPL-WRK001), the sequential chain has a fixed, pre-determined execution path: the steps, their order, and the schema contracts between them are defined at design time, not discovered at runtime. This makes the pattern highly predictable, auditable, and well-suited to regulated workflows where every processing step must be documented, version-controlled, and reproducible.
For CIO/CTO audiences: think of this as an assembly line for knowledge work. Each station (LLM step) receives a standardised input, performs a defined transformation, validates its output against a schema, and passes the validated result to the next station. If any station produces invalid output, the pipeline halts and raises a structured error — it does not silently pass bad data downstream. This predictability is exactly what regulated industries require: the same inputs always traverse the same steps in the same order, producing an auditable processing record.
2. Problem Statement
Business Problem
Many enterprise knowledge work processes are naturally sequential: extract data → classify → enrich → validate → format for output. Each stage has clear acceptance criteria. Implementing these as a single large LLM prompt produces unreliable results because the model must juggle all stages simultaneously. Decomposing into sequential steps with explicit handoffs between stages improves reliability and testability.
Technical Problem
A single monolithic prompt performing multiple transformation stages conflates concerns, making it impossible to identify which stage produced an error, test stages independently, or replace one stage with an improved implementation without affecting others. Output quality degrades as prompt complexity increases.
Symptoms of Absence
- Long, complex prompts that are difficult to test, debug, or improve
- Errors in one processing stage silently propagate as corrupted inputs to subsequent stages
- No per-stage quality measurement; overall pipeline quality is opaque
- Cannot replace or upgrade individual stages without rewriting the entire prompt
Cost of Inaction
- Quality: Silent error propagation produces compounding quality degradation across stages
- Maintainability: Monolithic prompts accumulate technical debt and become unmaintainable
- Auditability: Cannot produce per-stage processing evidence for regulated workflows
3. Context
When to Apply
- Workflow has a fixed, pre-determined sequence of transformation steps
- Each step has clear input and output schema requirements
- Steps can be independently tested and replaced
- The complete execution path is known at design time
- Auditability of each processing stage is required
When NOT to Apply
- Execution path depends on intermediate results (use ReAct, EAAPL-WRK001)
- Steps can be parallelised (use Fan-Out/Fan-In, EAAPL-WRK003)
- Task requires dynamic routing between specialist steps (use Router/Dispatcher, EAAPL-WRK004)
- Steps have complex conditional branching logic (use Conditional Routing, EAAPL-WRK011)
Prerequisites
- Defined schema for each step's input and output (JSON Schema or Pydantic models)
- Step-level prompt templates versioned in a prompt registry
- Per-step validation logic
- Pipeline state store for intermediate results
Industry Applicability
| Industry |
Pipeline Example |
Stages |
| Financial Services |
Loan application processing |
Extract → Score → Verify → Recommend → Format |
| Legal |
Contract review |
Extract clauses → Classify obligations → Flag risks → Summarise → Report |
| Healthcare |
Clinical note processing |
Extract symptoms → Map ICD codes → Flag interactions → Generate summary |
| Government |
Grant application assessment |
Extract criteria → Check eligibility → Score merit → Produce recommendation |
| Insurance |
Claims processing |
Extract claim → Classify type → Assess validity → Calculate → Format decision |
4. Architecture Overview
The Sequential Chain establishes a pipeline orchestrator that manages step execution, state passing, and schema validation. Each step is a self-contained unit: a prompt template, an LLM invocation, and an output validator.
Pipeline State Object
A typed state object is passed through the pipeline. Each step reads its required inputs from the state and writes its outputs back to the state. The state object accumulates the outputs of all completed steps, creating a complete processing record. The state schema is defined upfront and version-controlled alongside the pipeline definition.
Step Execution
Each step: (1) reads its input fields from the pipeline state, (2) renders its prompt template with those inputs, (3) invokes the LLM with structured output mode, (4) validates the LLM output against the step's output schema, (5) writes validated outputs to pipeline state. If validation fails, the step raises a structured error with the field-level validation failure details.
Schema Validation at Every Boundary
The schema contract between steps is enforced by validation, not trust. The pipeline orchestrator validates step output before passing it to the next step. This ensures that errors are caught at the step that produced them, not discovered when a downstream step fails with a cryptic error.
Error Handling Strategy
Step failures are handled at two levels: (1) transient failures (LLM timeout, malformed JSON output) trigger a configurable number of retries with exponential backoff; (2) validation failures (schema non-compliance, business rule violation) are terminal unless a correction prompt is configured for that step.
Pipeline Audit Record
On completion (success or failure), the pipeline orchestrator writes a full audit record: pipeline version, step-by-step inputs/outputs, validation results, timestamps, and final status. This record is the primary auditability artefact.
5. Architecture Diagram
flowchart TD
subgraph Input["Pipeline Input"]
A[Raw Input]
end
subgraph Pipeline["Sequential Execution Pipeline"]
B[Step 1: Extract]
V1{Schema Validate}
C[Step 2: Classify]
V2{Schema Validate}
D[Step 3: Enrich]
V3{Schema Validate}
E[Step N: Format]
VN{Schema Validate}
end
subgraph State["Pipeline State"]
S[(State Object)]
end
subgraph Output["Pipeline Output"]
F[Final Output]
G[Error Report]
end
A --> B
B --> V1
V1 -->|valid| C
V1 -->|invalid| G
C --> V2
V2 -->|valid| D
V2 -->|invalid| G
D --> V3
V3 -->|valid| E
V3 -->|invalid| G
E --> VN
VN -->|valid| F
VN -->|invalid| G
B <--> S
C <--> S
D <--> S
E <--> S
6. Components
| Component |
Type |
Responsibility |
Technology Options |
Criticality |
| Pipeline Orchestrator |
Workflow Engine |
Manages step sequencing, state passing, error handling |
LangChain LCEL; LlamaIndex Pipeline; AWS Step Functions; custom Python |
Critical |
| Step Executor |
AI Component |
Renders prompt template; invokes LLM; returns raw output |
OpenAI, Bedrock, Azure OpenAI, on-prem vLLM |
Critical |
| Schema Validator |
Logic Component |
Validates step output against JSON Schema or Pydantic model |
Pydantic v2; JSON Schema Draft 7; AJV (JS) |
Critical |
| Pipeline State Store |
State |
Holds accumulated step outputs; persisted for audit |
Redis (transient); PostgreSQL JSONB (persistent); DynamoDB |
High |
| Prompt Template Registry |
Configuration |
Versioned, parameterised prompt templates per step |
LangChain Hub; custom YAML; Promptfoo |
High |
| Retry Controller |
Resilience |
Retries failed LLM calls with exponential backoff |
Custom; Tenacity (Python); AWS Step Functions retry policy |
High |
| Audit Record Writer |
Governance |
Writes full pipeline execution record on completion |
S3; PostgreSQL; Splunk |
High |
| Step Metrics Emitter |
Observability |
Emits per-step latency, token usage, validation pass/fail |
Prometheus; CloudWatch; Datadog |
Medium |
7. Data Flow
| Step |
Actor |
Action |
Output |
| 1 |
Caller |
Submits raw input document; specifies pipeline version |
{raw_input, pipeline_version: "contract-review-v2.1"} |
| 2 |
Pipeline Orchestrator |
Loads pipeline definition; initialises state object |
Empty state with pipeline metadata |
| 3 |
Step 1 Executor |
Extracts structured data from raw input |
{parties: [...], clauses: [...], effective_date: "2025-07-01"} |
| 4 |
Schema Validator |
Validates Step 1 output against ExtractedContract schema |
PASS — writes to state |
| 5 |
Step 2 Executor |
Classifies each clause by type (obligation, right, limitation, indemnity) |
{classified_clauses: [{id: 1, type: "obligation", text: "..."},...]} |
| 6 |
Schema Validator |
Validates Step 2 output |
PASS — writes to state |
| 7 |
Step 3 Executor |
Identifies risk flags in obligations |
{risk_flags: [{clause_id: 1, risk: "unlimited_liability", severity: "high"}]} |
| 8 |
Schema Validator |
Validates Step 3 output |
PASS — writes to state |
| 9 |
Step 4 Executor |
Generates executive summary and recommendations |
{summary: "...", recommendations: [...]} |
| 10 |
Audit Record Writer |
Writes full execution record |
Audit record with all step inputs/outputs |
Error Flow
| Error |
Detection |
Recovery |
| LLM returns malformed JSON |
JSON parse error |
Retry up to 3 times with explicit JSON instruction; escalate on failure |
| Schema validation failure |
Pydantic ValidationError |
If correction_prompt configured: send output + error to LLM for self-correction; else halt |
| LLM timeout |
HTTP timeout |
Exponential backoff retry; 3 attempts; halt with partial state on final failure |
| Business rule violation (e.g. required field empty) |
Custom validator |
Halt pipeline; return error with field-level details; preserve partial state |
8. Security Considerations
Schema as Security Boundary
- Each step's output schema acts as a type-safe boundary that prevents injection of unexpected fields or types into downstream steps
- Mitigation: Use strict schema validation (no additionalProperties); validate that string fields do not contain instruction-like patterns
OWASP LLM Top 10
| OWASP LLM Risk |
Sequential Chain Applicability |
Mitigation |
| LLM01 Prompt Injection |
Raw input is injected into Step 1 prompt |
Input sanitisation before Step 1; use input delimiters |
| LLM02 Insecure Output Handling |
Step output passed to downstream system without validation |
Mandatory schema validation at every step boundary |
| LLM09 Overreliance |
Downstream steps trust upstream outputs without verification |
Schema validation does not guarantee semantic correctness; human review gate for regulated outputs |
| LLM06 Sensitive Information |
Pipeline state accumulates sensitive data across steps |
State encryption at rest; PII masking before step execution if not required by that step |
9. Governance Considerations
Pipeline Version Control
- Every pipeline definition (step sequence + prompt templates + schemas) is version-controlled and tagged
- Production pipelines are promoted through dev → staging → prod with regression test gate
- Schema changes between steps are breaking changes and require a new pipeline version
Governance Artefacts
| Artefact |
Owner |
Frequency |
Purpose |
| Pipeline Definition Registry |
AI Platform |
On change |
Version-controlled step definitions, schemas, prompt templates |
| Step Validation Report |
AI Operations |
Weekly |
Per-step validation pass/fail rates; identifies degrading steps |
| Pipeline Audit Archive |
Compliance |
Per execution; retained per policy |
Full execution records for regulated workflows |
| Schema Change Impact Assessment |
Architecture Board |
On schema change |
Documents impact of schema changes on downstream steps |
10. Operational Considerations
SLOs
| SLO |
Target |
Window |
Alert |
| Pipeline completion rate (all steps pass) |
≥ 98% |
24-hour rolling |
< 95% triggers P2; check validation failure rates |
| Per-step validation pass rate |
≥ 99% |
24-hour rolling |
< 97% triggers P3; review step prompt or schema |
| p95 pipeline end-to-end latency |
≤ 60s (N=4 steps) |
1-hour rolling |
> 90s triggers P2 |
| Step retry rate |
≤ 1% |
24-hour rolling |
> 3% triggers P3; check LLM provider stability |
Monitoring
- Per-step token usage and latency trending: identify steps becoming more expensive over time
- Validation failure heatmap: which step and which field fails most often
- Pipeline throughput: tasks per hour per pipeline version
11. Cost Considerations
| Pipeline Configuration |
Steps |
Approx. Cost per Pipeline Run (GPT-4o) |
Notes |
| Light pipeline |
2–3 steps |
$0.02–0.08 |
Simple classification and formatting |
| Standard pipeline |
4–6 steps |
$0.08–0.30 |
Typical knowledge work pipeline |
| Complex pipeline |
7–10 steps |
$0.30–0.80 |
Deep analysis with multiple enrichment steps |
| With retries (avg 1.1×) |
Any |
+10% overhead |
Low retry rate for well-calibrated pipelines |
Optimisations
- Use smaller models for extraction and formatting steps; larger models for reasoning steps
- Batch multiple pipeline runs when throughput > latency priority
- Cache identical step inputs to avoid reprocessing duplicate content
12. Trade-Off Analysis
| Option |
Reliability |
Auditability |
Flexibility |
Complexity |
Best For |
| A: Sequential Chain with schema validation (Recommended) |
Very High |
Very High |
Low |
Medium |
Regulated, fixed-path workflows |
| B: Monolithic prompt |
Low |
Very Low |
High |
Low |
Prototyping only |
| C: ReAct loop (EAAPL-WRK001) |
High |
High |
Very High |
Medium |
Dynamic, path-dependent tasks |
| D: Router/Dispatcher (EAAPL-WRK004) |
High |
High |
High |
High |
Multi-type input needing specialist handling |
Architectural Tensions
| Tension |
Left Pole |
Right Pole |
Balance |
| Step granularity |
Many small steps (testable, replaceable) |
Few large steps (lower latency, fewer API calls) |
4–6 steps is the practical sweet spot for most pipelines |
| Schema strictness |
Strict (rejects anything unexpected) |
Lenient (accepts partial output) |
Strict at domain boundaries; lenient for supplementary fields |
| Retries vs. Fast-fail |
Aggressive retry (high completion rate) |
Immediate failure (fast error propagation) |
Retry transient errors (3 attempts); fail immediately on schema violations |
13. Failure Modes
| Failure Mode |
Likelihood |
Impact |
Detection |
Recovery |
| Schema drift (prompt produces output that no longer matches schema) |
Medium |
High — pipeline breaks on schema mismatch |
Schema validation failure alert |
Schema compatibility test in CI/CD; prompt regression tests |
| Silent semantic error (valid schema, wrong content) |
Medium |
High — bad data propagates downstream |
Human audit; end-to-end quality benchmarks |
Quality evaluation step at pipeline end; statistical output monitoring |
| State store unavailable |
Low |
High — pipeline cannot persist intermediate state |
Health check; exception handling |
Retry with backoff; fallback to in-memory state for non-regulated pipelines |
| Model regression (same prompt, worse output quality) |
Low–Medium |
Medium |
Per-step quality benchmark |
Pin model version; canary deployment for model upgrades |
| Step ordering error (design flaw) |
Low |
High — downstream steps receive unexpected input |
Integration tests on pipeline definition |
Schema contract testing validates step interface compatibility |
14. Regulatory Considerations
ISO 42001
- §8.4: The pipeline definition (step sequence, schemas, prompt versions) constitutes the AI system operational specification; must be version-controlled and change-managed.
APRA CPS 230
- The pipeline audit record (full step-by-step processing evidence) supports the operational resilience and auditability requirements for AI systems in material business processes.
- Each pipeline version must be associated with the business process it supports; changes require impact assessment.
Australian Context
- For AFS-licensed entities using sequential chains in decision-making (credit assessment, insurance underwriting), each pipeline step output must be retainable and reproducible to support dispute resolution and ASIC inquiry requirements.
- OAIC guidance on automated decision-making recommends that each transformation stage be documentable — the sequential chain's per-step audit record directly satisfies this.
15. Reference Implementations
AWS
| Component |
Service |
| Pipeline Orchestration |
AWS Step Functions Express Workflows |
| Step Execution |
Amazon Bedrock InvokeModel with structured output |
| Schema Validation |
Lambda function with Pydantic; Bedrock Guardrails |
| State Store |
AWS DynamoDB (per-pipeline execution state) |
| Audit Archive |
S3 with lifecycle policy |
Azure
| Component |
Service |
| Pipeline Orchestration |
Azure Durable Functions (function chaining pattern) |
| Step Execution |
Azure OpenAI Service with response_format: json_object |
| Schema Validation |
Azure Functions with Pydantic validation layer |
| State Store |
Azure Cosmos DB |
| Audit Archive |
Azure Blob Storage with immutable storage policy |
On-Premises
| Component |
Technology |
| Pipeline Orchestration |
LangChain LCEL pipeline; Apache Airflow for scheduled pipelines |
| Step Execution |
vLLM with structured output (outlines grammar) |
| Schema Validation |
Pydantic v2 models |
| State Store |
PostgreSQL with JSONB columns |
| Pattern |
ID |
Relationship Type |
Notes |
| Workflow State Machine |
EAAPL-WRK012 |
Peer |
State machine adds explicit state transitions; sequential chain is a special case with linear transitions |
| Conditional Routing |
EAAPL-WRK011 |
Extends |
Add conditional branching to a sequential chain when steps have variable next-steps |
| Router/Dispatcher |
EAAPL-WRK004 |
Peer |
Dispatcher selects which chain to execute; chains execute sequentially within each branch |
| Parallel Fan-Out/Fan-In |
EAAPL-WRK003 |
Complementary |
Fan-out independent steps from a sequential chain for parallelism |
| Workflow Tracing & Replay |
EAAPL-WRK013 |
Integrates With |
Trace records every step execution for replay and debugging |
17. Maturity Assessment
Overall Maturity: Industry Standard
| Dimension |
Score (1–5) |
Evidence |
| Research Foundation |
4 |
Well-established prompt chaining literature; LCEL, Haystack pipelines widely documented |
| Production Deployment |
5 |
Deployed at scale across all industries; most common LLM application pattern in enterprise |
| Framework Support |
5 |
LangChain LCEL, LlamaIndex, Haystack, Azure Prompt Flow all implement natively |
| Schema Tooling |
4 |
Pydantic + Instructor widely adopted; OpenAI structured output GA; tooling mature |
| Observability |
4 |
Per-step tracing available in LangSmith, Azure Monitor, AWS X-Ray |
18. Revision History
| Version |
Date |
Author |
Changes |
| 1.0 |
2025-06-13 |
Architecture Board |
Initial publication in Agentic Workflows category |