EAAPLEnterprise AI Architecture Pattern Library
EAAPLLibraryRetrieval-Augmented GenerationEAAPL-RAG004
EAAPL-RAG004Proven
⇄ Compare

Federated Retrieval-Augmented Generation

[EAAPL-RAG004] Federated Retrieval-Augmented Generation

Category: Artificial Intelligence / Retrieval-Augmented Generation Sub-category: Distributed and Federated Retrieval Version: 1.1 Maturity: Emerging Tags: rag federated distributed data-sovereignty privacy-preserving cross-organisation score-normalisation Regulatory Relevance: GDPR Chapter V (Cross-border transfers), Privacy Act 1988 Part IIIA, EU AI Act Article 10, APRA CPS234 §55 (third-party data), Australian Data Sovereignty requirements


1. Executive Summary

Federated RAG enables retrieval across multiple organisationally or geographically distributed knowledge bases without requiring data to be centralised in a single repository. Each participating node maintains its own vector index and document store, enforces its own access controls, and responds to retrieval requests without exposing raw documents to the federation coordinator. A central orchestrator fans out queries to participating nodes, collects ranked result sets, normalises scores across heterogeneous indexes, and assembles the final context for the language model.

For CIOs navigating cross-border data regulations, multi-entity corporate structures, or public-private data sharing partnerships, Federated RAG provides a principled architecture for enabling AI-powered knowledge synthesis across organisational boundaries without violating data residency, sovereignty, or confidentiality obligations. The pattern is relevant to government agency networks (where each agency maintains its own data under separate legislative mandates), healthcare provider networks (where patient data cannot leave jurisdictions), joint ventures (where each party contributes knowledge without disclosing the full corpus to the other), and multi-cloud enterprise architectures (where data must remain within specific cloud regions due to contractual or regulatory constraints).


2. Problem Statement

Business Problem

Organisations increasingly need to answer questions that span multiple independent knowledge bases, but the data owners of those bases have legitimate legal, commercial, or regulatory reasons not to share raw content. A government department asked to answer a cross-agency compliance question should not need to copy another agency's classified data into its own systems. A hospital network should not centralise patient records from all member hospitals into a single repository to enable AI search. A joint venture should not require each partner to surrender commercial-in-confidence documents to a shared index.

Technical Problem

Standard centralised RAG requires all documents to be ingested into a single vector index. This is architecturally incompatible with data sovereignty requirements, cross-border transfer restrictions, organisational trust boundaries, and contractual non-disclosure obligations. Naive distributed solutions (running independent RAG systems and aggregating answers) compound the problem: scores from different indexes are not comparable, different indexes may use different embedding models, and there is no principled mechanism for cross-node relevance ranking.

Symptoms

  • Stalled AI initiatives because legal and compliance teams cannot approve centralised data aggregation
  • Separate, siloed AI assistants per business unit with no ability to answer cross-unit questions
  • Manual "copy and paste from multiple systems" workflows for staff who need cross-domain answers
  • Joint venture or consortium participants rejecting shared data infrastructure proposals

Cost of Inaction

  • Inability to leverage AI for cross-organisational knowledge synthesis, leaving manual research workflows in place
  • Missed regulatory intelligence: compliance teams unable to get cross-jurisdiction answers from distributed regulatory corpora
  • Competitive disadvantage versus organisations that have successfully federated knowledge infrastructure

3. Context

When to Apply

  • Cross-organisation AI knowledge sharing where raw data cannot be centralised (government agencies, healthcare networks, industry consortia)
  • Multi-jurisdiction deployments where data must remain within specific geographic boundaries
  • Joint ventures and public-private partnerships where each party retains data sovereignty
  • Large enterprises with independent subsidiaries or business units that have separate data governance regimes
  • Regulatory constraint: the organisation is subject to data localisation requirements (EU GDPR Chapter V, Australian data sovereignty) that prohibit centralised indexing

When NOT to Apply

  • Data can be legally and contractually centralised (use EAAPL-RAG001 for significantly better retrieval quality)
  • Single-tenant deployment with no cross-boundary requirements
  • Latency requirements preclude network round-trips to remote nodes (federated retrieval adds 50–200ms per node)
  • All nodes use incompatible embedding models and re-embedding is not feasible (score normalisation quality degrades severely)

Prerequisites

  • Each participating node must expose a standardised retrieval API (REST or gRPC) that accepts a query vector and metadata filters and returns ranked result sets with scores and metadata
  • Score normalisation requires nodes to use compatible (ideally identical) embedding models, or to implement score calibration
  • A federation coordination layer (either a dedicated service or the primary node acting as coordinator)
  • Inter-node network connectivity with appropriate security (mTLS, API keys, VPN tunnels between trusted nodes)
  • Data sharing agreement or federation protocol specifying what metadata each node may expose

Industry Applicability

Industry Federation Participants Data Sovereignty Driver Example Use Case
Government Federal + State agencies Legislative jurisdiction separation Whole-of-government regulatory knowledge assistant
Healthcare Hospital networks + GP clinics Patient data cannot leave jurisdiction Clinical decision support across care network
Financial Services Group entities + subsidiaries Separate legal entities; intra-group data transfer rules Group-wide risk and compliance knowledge base
Education Universities in national consortium FERPA/Privacy Act per institution National research knowledge assistant
Defence & Intelligence Allied nation agencies National security classification regimes Coalition knowledge sharing (lowest-classification tier)

4. Architecture Overview

Federated RAG decomposes into three functional layers: local nodes (which own their data and serve retrieval requests), the federation coordinator (which orchestrates cross-node retrieval and normalises results), and the generation layer (which assembles the federated context and invokes the LLM).

Local Node Architecture

Each local node is a complete, independently operated RAG stack: document ingestion pipeline, chunking engine, embedding model, vector database, and a retrieval API endpoint. The node enforces its own access controls — the coordinator has no ability to bypass the node's ACL enforcement. The node exposes a retrieval endpoint that:

  1. Accepts a query vector (already embedded by the coordinator) and optional metadata filters
  2. Enforces local ACL for the requesting entity (the coordinator's identity, not the end user's)
  3. Returns a ranked result set: [(chunk_text, metadata, score, chunk_id)] — crucially, never raw document bytes
  4. Applies rate limiting per coordinator identity to prevent denial-of-service

The node's governance principle is: the node owner controls what the node reveals. The node may return only metadata and scores (no chunk text) for sensitive documents, requiring the coordinator to resolve context from an approved escrow, or to simply note "relevant classified content exists but cannot be included in context."

Federation Coordinator

The coordinator is the query orchestration layer. Upon receiving a user query, the coordinator:

  1. Embeds the query using the shared embedding model
  2. Determines which nodes to query based on the query's topic scope and the user's inter-node authorisation
  3. Fans out the query vector in parallel to all relevant nodes (with a configurable timeout per node)
  4. Collects ranked result sets from responding nodes
  5. Applies score normalisation to make scores from different nodes comparable
  6. Re-ranks the normalised result set to produce a unified top-K list
  7. Assembles context from the returned chunk texts (subject to each node's disclosure level)
  8. Records which nodes were queried, which responded, and which declined (for audit and transparency)

Score Normalisation

Score normalisation is the most technically complex component of Federated RAG. When nodes use the same embedding model, cosine similarity scores are on the same scale and can be compared directly. When models differ (which should be avoided but may be unavoidable), score normalisation is required: each node's score distribution is modelled (mean and standard deviation) using calibration queries, and scores are Z-normalised before cross-node ranking. Reciprocal Rank Fusion (RRF) provides an alternative that is model-agnostic and rank-based, requiring only a ranked list from each node rather than raw similarity scores.

Privacy-Preserving Retrieval Modes

For maximum privacy preservation, the coordinator can operate in a query vector obfuscation mode using techniques from privacy-preserving machine learning: the query vector is perturbed with calibrated noise before being sent to nodes (analogous to differential privacy), preventing nodes from reconstructing the original query. This mode trades some retrieval quality for stronger privacy guarantees — the node cannot infer the full semantic content of the query. This mode is recommended for inter-organisation federation where the coordinator does not fully trust all nodes.


5. Architecture Diagram

ARCHITECTURE DIAGRAM
flowchart TD subgraph Nodes["Federated Nodes"] A[Node A API] B[Node B API] C[Node C API] end subgraph Coordinator["Federation Coordinator"] D[User Query] E[Node Router] F[Score Normaliser] G[LLM Generation] end subgraph Audit["Governance"] H[Federation Audit Log] end D --> E E -->|fan-out mTLS| A E -->|fan-out mTLS| B E -->|fan-out mTLS| C A -->|ranked results| F B -->|ranked results| F C -->|declined or results| F F --> G --> D E --> H F --> H style D fill:#dbeafe,stroke:#3b82f6 style E fill:#f0fdf4,stroke:#22c55e style A fill:#fef9c3,stroke:#eab308 style B fill:#fef9c3,stroke:#eab308 style C fill:#fef9c3,stroke:#eab308 style F fill:#f0fdf4,stroke:#22c55e style G fill:#d1fae5,stroke:#10b981 style H fill:#fef9c3,stroke:#eab308

6. Components

Component Type Responsibility Technology Options Criticality
Local Node Retrieval API Integration Serve retrieval requests to the coordinator; enforce local ACL FastAPI / gRPC service; cloud API Gateway Critical
Local Vector Database Storage Node-local vector index; never exposed directly to coordinator Weaviate, Qdrant, pgvector, OpenSearch Critical
Shared Embedding Model ML Inference Ensure all nodes embed in the same vector space Agreed embedding model deployed at each node; identical version Critical
Federation Coordinator Orchestration Fan out queries; collect and normalise results; assemble context Custom Python service; LangChain with custom retriever; LlamaIndex federation Critical
Node Router Business Logic Select which nodes to query for a given query type Custom routing rules + topic classifier High
Score Normaliser Algorithm Normalise heterogeneous scores to a comparable scale RRF (rank-based, model-agnostic) or Z-normalisation (same model required) High
Global Re-ranker Ranking Re-rank normalised cross-node results Cross-encoder re-ranker; Cohere Rerank API Medium
Federation Audit Logger Compliance Record which nodes were queried, which responded, which declined Append-only log store; tamper-evident High
Inter-Node Security (mTLS) Security Authenticate coordinator to each node; encrypt retrieval traffic mTLS certificates (SPIFFE/SPIRE for zero-trust); API key + TLS fallback Critical
Query Vector Obfuscator (optional) Privacy Add calibrated noise to query vector before sending to untrusted nodes Custom differential privacy library; RAPPOR-style perturbation Medium

7. Data Flow

Primary Flow

Step Actor Action Output
1 Each local node Ingest documents from local sources; build local vector index Independent local vector indexes per node
2 User Submit query to federation coordinator Query string + user identity
3 Query Embedder Embed query using shared embedding model Query vector
4 Node Router Determine relevant nodes based on query topic scope and user's inter-node authorisation List of target node endpoints
5 Federation Coordinator Fan out query vector in parallel to all target nodes with inter-node auth credentials Parallel retrieval requests
6 Each local node Validate coordinator identity; enforce local ACL; execute local ANN search; return ranked results [(chunk_text, metadata, score)] per node or DECLINED
7 Score Normaliser Apply RRF or Z-normalisation to align scores across nodes Normalised, unified candidate list with node provenance
8 Global Re-ranker Re-rank unified candidate list Top-N candidates
9 Context Assembler Assemble prompt with node provenance labels on each chunk Assembled prompt
10 LLM Generate answer with cross-node citations Raw response
11 Response Return answer with attribution to specific nodes Answer + [Source: Node A - Doc X] citations
12 Federation Audit Logger Record complete query execution: nodes queried, response times, scores, declinations Immutable audit record

Error Flow

Error Condition Detection Recovery
Node timeout (node unavailable or slow) Per-node timeout (configurable, default 2s) Proceed with results from available nodes; note unavailable node in response metadata
Node declines query (ACL rejection for coordinator identity) Node returns 403 Log declination; note in response: "Some sources unavailable due to access restrictions"
Score normalisation failure (node uses different embedding model) Score distribution anomaly detection Fall back to RRF (rank-based); flag model mismatch for node operator
All nodes decline or timeout No results assembled Return "No accessible content found across federated sources"; do not generate from LLM

8. Security Considerations

Inter-Node Trust Model

The federation coordinator authenticates to each node using mTLS mutual authentication. Each node maintains a whitelist of coordinator certificate fingerprints it trusts. This prevents unauthorised coordinators from querying nodes. Nodes do not trust the coordinator's claim about end-user identity — nodes enforce only the coordinator's identity against the inter-node access policy.

Data Minimisation in Federation Protocol

The retrieval API response should include the minimum data necessary: chunk text (if the node's data sharing agreement permits), score, and metadata (document title, date, classification). Raw document bytes are never returned in the retrieval protocol. For highly sensitive nodes, chunk text may be replaced with a summary or an opaque reference, with the coordinator noting "relevant content exists but is not disclosable."

OWASP LLM Top 10 Mitigations

OWASP LLM Risk Federated Specific Concern Mitigation
LLM01: Prompt Injection Malicious content injected in one node's documents propagates through federation to the coordinator's LLM Node-side content sanitisation; coordinator treats all retrieved content as untrusted data
LLM06: Sensitive Information Disclosure Coordinator may inadvertently reveal to node B that node A has a specific document (via query vectors) Query vector obfuscation for inter-organisation federation; query vector is not logged at nodes
LLM09: Overreliance User assumes all nodes were queried; a declined or timed-out node is silently excluded Explicitly surface node availability status in every response

9. Governance Considerations

Federation Governance Framework

Each federation must have a formal data sharing agreement that specifies: which nodes participate, what metadata each node may expose, the agreed embedding model version, the score normalisation method, audit log sharing obligations, and the dispute resolution process when a node incorrectly declines a coordinator request.

Governance Artefacts

Artefact Owner Frequency Purpose
Federation Data Sharing Agreement Legal / All Participating Organisations Per federation; reviewed annually Legal basis for cross-node retrieval
Node Directory Federation Coordinator Continuous Maintain registry of participating nodes, their data domains, and availability SLAs
Federation Audit Log Coordinator + Each Node Per query Immutable record of inter-node queries for compliance and dispute resolution
Embedding Model Version Agreement All Nodes Per model upgrade Ensure all nodes upgrade embedding model atomically to maintain score comparability

10. Operational Considerations

Monitoring

Metric Alert Threshold Notes
Node availability (per node) < 99% over 1 hour Alert node operator; degrade response noting unavailable node
Cross-node query latency P95 > 3 seconds Check slowest node; consider reducing timeout
Node declination rate > 20% of queries to a given node Investigate ACL configuration; may indicate access policy change
Score normalisation quality (RRF rank correlation) Significant correlation drop Indicates embedding model version mismatch between nodes

Service Level Objectives

SLO Target Notes
Federated query response P95 ≤ 4 seconds Longer than centralised RAG due to network fan-out
Node availability (per node) ≥ 99.5% Per federation agreement
Federation coordinator availability ≥ 99.9% Coordinator is the single point of failure; deploy multi-AZ

11. Cost Considerations

Cost Drivers

Cost Driver Notes
Per-node infrastructure Each node is a full RAG stack; N nodes = N × single-node infrastructure cost
Cross-node network egress Network egress costs for result transmission between nodes and coordinator
Federation coordinator compute Fan-out, score normalisation, and re-ranking are compute-intensive at scale
Embedding model agreement enforcement Coordinating atomic embedding model upgrades across nodes requires change management overhead

Indicative Cost Range

Federation Scale Monthly Cost Range
3–5 nodes, small corpora $3,000 – $10,000 per node + $2,000–$5,000 coordinator
5–10 nodes, medium corpora $5,000 – $20,000 per node + $5,000–$15,000 coordinator
10+ nodes, large corpora Custom pricing; infrastructure-as-code essential

12. Trade-Off Analysis

Centralised vs. Federated RAG

Dimension Centralised RAG (RAG001) Federated RAG (RAG004)
Retrieval Quality Highest (unified index, no score normalisation noise) Lower (score normalisation introduces noise)
Data Sovereignty Low (data must be centralised) High (data never leaves node)
Latency Lowest Higher (network fan-out adds 50–200ms)
Operational Complexity Medium High (N independent stacks + coordinator)
Cost Lower at scale Higher (N × node cost)
Recommended For Single-org, no sovereignty constraints Multi-org, data residency requirements

Architectural Tensions

Tension Trade-off Recommendation
Node response completeness vs. privacy Full chunk text in response: best quality; metadata-only: private Data sharing agreement governs; default to full text for trusted nodes, metadata-only for untrusted
Query timeout per node vs. completeness Short timeout: fast but may miss slow nodes; long timeout: complete but slow Async fan-out with 2s timeout; include results from slow nodes in "extended" mode if user requests

13. Failure Modes

Failure Mode Likelihood Impact Detection Recovery
Embedding model version drift between nodes Medium High (score normalisation fails) Score distribution anomaly detection Enforce embedding model version in node registration; alert on version mismatch
Coordinator as single point of failure Low Critical Health check monitoring Multi-AZ coordinator deployment; circuit breaker per node
Node A reveals existence of document to Node B via query pattern Low Medium Privacy audit of query logs at each node Query vector obfuscation; prohibit node-side query logging beyond request metadata
Federation data sharing agreement expires Low High Automated agreement expiry monitoring Alert 30/60/90 days before expiry; suspend node access on expiry

14. Regulatory Considerations

Regulation Requirement Federated RAG Response
GDPR Chapter V Personal data cannot be transferred outside EEA without adequate protections Data never leaves the node — federated retrieval transmits result sets, not raw data; result sets may still contain personal data if not redacted
Privacy Act 1988 APP 8 Cross-border disclosure of personal information Node-level PII redaction (EAAPL-RAG003) applied before result sets leave the node's jurisdiction
Australian Data Sovereignty Government data must remain in Australia Node hosted on Australian infrastructure; coordinator queries never cause data to cross border
EU AI Act Article 10 Data governance for AI training and operation Data governance remains with each node; federation protocol does not create new data aggregation

15. Reference Implementations

AWS (Multi-Region Federation)

  • Local nodes: OpenSearch k-NN in each AWS region; Lambda-based retrieval API
  • Coordinator: ECS Fargate service in primary region; Step Functions for fan-out
  • Inter-node security: AWS PrivateLink or VPC peering + mTLS; IAM cross-account roles
  • Score normalisation: Lambda function with RRF implementation

Azure (Cross-Tenant Federation)

  • Local nodes: Azure AI Search per tenant; Azure Functions as retrieval API
  • Coordinator: Azure Container Apps; Logic Apps for fan-out orchestration
  • Inter-node security: Azure AD B2B + managed identity; Private Endpoints
  • Score normalisation: Azure Function with RRF

On-Premises + Cloud Hybrid

  • Local nodes: Weaviate or Qdrant on-premises + cloud nodes
  • Coordinator: Kubernetes deployment with custom federation service
  • Inter-node security: WireGuard VPN between on-premises and cloud; mTLS with SPIFFE/SPIRE

Pattern ID Pattern Name Relationship
EAAPL-RAG001 Enterprise RAG Foundation for each local node; RAG004 federates multiple RAG001 instances
EAAPL-RAG003 Secure RAG Applied at each local node before transmitting results; prevents PII leakage through federation
EAAPL-RAG005 Hybrid RAG Can be applied within each local node for better per-node retrieval quality
EAAPL-KNW004 Vector Database Management Governs each local node's vector database independently

17. Maturity Assessment

Overall Maturity: Emerging — The core retrieval federation mechanics are proven; score normalisation at scale and privacy-preserving query obfuscation are active research areas; production deployments are limited to well-resourced organisations.

Dimension Score (1–5) Rationale
Technology Readiness 3 Core components available; federation coordination and score normalisation lack turnkey tooling
Tooling Ecosystem 2 No mature federation framework; custom orchestration required
Operational Guidance 2 Limited production guidance; each deployment is largely custom
Security & Compliance 3 mTLS and local ACL enforcement are well-understood; query privacy obfuscation is experimental
Scalability Evidence 2 Small-scale federations (3–5 nodes) proven; large federations (20+ nodes) mostly theoretical
Cost Predictability 2 N × single-node cost model; coordination overhead highly variable

18. Revision History

Version Date Author Changes
1.0 2024-06-01 EAAPL Working Group Initial publication
1.1 2025-01-15 EAAPL Working Group Score normalisation section expanded; RRF formalised; privacy-preserving query obfuscation added
← Back to LibraryMore Retrieval-Augmented Generation