Proven

EAAPL-KNW004: Vector Database Management

Pattern ID: EAAPL-KNW004 Status: Proven Complexity: Medium Tags: vector-search embedding high-availability medium-complexity Version: 1.0 Last Updated: 2026-06-12

1. Executive Summary

Vector databases are the retrieval backbone of enterprise AI systems. Mismanaged, they silently degrade: index parameters tuned for a test dataset fail under production data volumes; embedding model upgrades require full re-indexing with no downtime strategy; multi-tenant deployments leak context between namespaces; and backup procedures that were never tested cannot restore under pressure.

This pattern provides the operational framework for managing enterprise vector databases at production scale. It covers selection criteria and a decision matrix across major platforms, production-grade HNSW and IVF index parameter tuning, backup and point-in-time restore with cross-region DR, performance tuning for query latency and recall trade-offs, multi-tenancy and namespace isolation to prevent cross-tenant data leakage, and a comprehensive monitoring framework covering index health, query performance, and recall quality.

For CIOs and CTOs, the key message is: a vector database is not a commodity data store. It requires the same operational rigour as a relational database supporting a critical business application — high-availability configuration, tested DR procedures, performance engineering, and ongoing monitoring. Organisations that treat vector databases as "plug and play" infrastructure experience silent quality degradation and unplanned downtime within 6–12 months of production launch.

2. Problem Statement

2.1 Business Problem

Enterprise AI applications depend on vector database retrieval for accuracy. When retrieval quality degrades — due to stale indexes, suboptimal HNSW parameters, or index fragmentation from frequent updates — AI answer quality degrades proportionally. Business users experience this as the AI "getting worse over time" without any identifiable change to the AI model itself. The root cause is invisible without specific vector database monitoring.

2.2 Technical Problem

Vector databases present several operational challenges that differ from relational databases. HNSW (Hierarchical Navigable Small World) indexes have a recall-versus-latency trade-off governed by parameters (ef, M, ef_construction) that must be tuned for specific dataset characteristics. These optimal settings at 100K vectors may be suboptimal at 10M vectors. Incremental upserts fragment HNSW indexes over time, degrading recall. Embedding model upgrades invalidate the entire index, requiring re-embedding and re-indexing with no native zero-downtime mechanism in most platforms.

2.3 Symptoms

AI answer quality perceived to decline over time despite no model changes
Query latency p99 spikes during index rebuild or heavy upsert periods
Cross-tenant AI answer contamination (one tenant's data appears in another's retrieval results)
DR test reveals vector database cannot be restored within acceptable RTO
Embedding model upgrade requires 12+ hours of downtime for large corpora
No alerting when vector index recall drops below acceptable threshold

2.4 Cost of Inaction

Silent AI quality degradation erodes user trust before the technical root cause is diagnosed
A security incident caused by cross-tenant vector leakage in a multi-tenant deployment carries regulatory and reputational consequences disproportionate to the technical complexity of prevention
Unplanned downtime during a critical business period because backup restoration has never been tested

3. Context

3.1 When to Apply

Any production RAG or semantic search deployment with >100K vectors
Deployments where query latency SLOs must be met contractually or for regulatory purposes
Multi-tenant AI deployments serving multiple business units, clients, or data classifications
Organisations with DR and business continuity requirements that must cover AI infrastructure
Deployments where embedding model upgrades are anticipated within the first 12 months of production

3.2 When NOT to Apply

Prototype or PoC deployments with <10K vectors — managed services with defaults are sufficient
Development/test environments — production operational rigour is not required pre-production
Single-tenant, single-classification deployments with relaxed latency requirements and minimal DR expectations

3.3 Prerequisites

Defined query latency SLO (p50, p95, p99 targets) for the AI applications served
Known or estimated dataset size (current and 12-month projection)
Data classification requirements identifying whether multi-tenancy is needed for security or organisational reasons
DR requirements (RTO/RPO) from the organisation's business continuity programme
Monitoring platform (Prometheus/Grafana or equivalent) already in operation

3.4 Industry Applicability

Industry	Applicability	Primary Use Case
Financial Services	High	RAG for regulatory documents, customer 360 semantic search, fraud vector similarity
Healthcare	High	Clinical knowledge retrieval, drug similarity search, patient record semantic matching
Technology / SaaS	High	Developer documentation search, code similarity, multi-tenant AI product
Retail / CPG	Medium	Product semantic search, personalisation, visual similarity search
Legal	High	Case law similarity, contract clause retrieval, regulatory lookup
Government	Medium	Policy library retrieval, citizen services knowledge base

4. Architecture Overview

The Vector Database Management pattern is organised around five operational disciplines: Selection, Index Management, Multi-tenancy, Backup and DR, and Performance Monitoring.

4.1 Selection Framework

Selecting a vector database is a function of workload characteristics, not vendor marketing. The selection process evaluates eight dimensions against the organisation's specific requirements.

Query latency SLO is the primary filter: if the SLO is sub-10ms p99, only in-memory solutions (Qdrant with in-memory mode, Redis vector search, or pgvector with HNSW) are viable. If 100–500ms p99 is acceptable, managed services (Pinecone, Weaviate Cloud, Amazon OpenSearch) are all candidates.

Dataset size determines whether HNSW (scales to hundreds of millions of vectors with careful tuning) or IVF (better for very large static datasets where rebuild cost is acceptable) is the primary index type. Pinecone and Weaviate handle multi-hundred-million vector counts in managed form; pgvector struggles above 50M vectors in most configurations.

Update frequency is critical: if vectors are added at high rates (thousands per minute), HNSW index fragmentation is a concern, and the solution must support incremental indexing with periodic defragmentation. Pinecone's internal architecture handles this well; self-hosted solutions require explicit management.

Multi-tenancy requirements determine whether namespace isolation, per-tenant collections, or shared collections with metadata filtering are required. See §4.4 for the security implications of each approach.

Managed vs. self-hosted is an operational capability decision. Managed services (Pinecone, Weaviate Cloud, Amazon OpenSearch Serverless) eliminate the infrastructure operations burden at the cost of reduced tuning control and vendor lock-in. Self-hosted (Qdrant, Weaviate, Milvus, pgvector) provide full control at the cost of engineering operations burden.

4.2 Index Management

HNSW Parameters. The three critical HNSW parameters are: M (number of bi-directional links per node — higher M means better recall but more memory and slower build time; production default 16–32), ef_construction (size of the dynamic candidate list during index build — higher means better recall but slower build; production default 128–400), and ef (query-time candidate list size — higher means better recall at the cost of query latency; tuned at query time, default 64–128 for production).

Parameter tuning process: measure recall on a golden query set (see §4.5) at increasing ef values until recall target is met, then measure latency. The intersection of recall target and latency SLO determines the production ef setting. Repeat this process when dataset size doubles, as optimal parameters shift.

HNSW Defragmentation. Frequent upserts (updates and inserts) cause HNSW index fragmentation. Fragmentation degrades recall without increasing latency, making it invisible in standard latency monitoring. A scheduled recall measurement job against the golden query set is the only reliable detection mechanism. When recall drops below threshold, a rebuild (or segment merge for platforms that support it) is required. Plan for periodic rebuild windows — typically off-peak hours — with blue/green index promotion to avoid downtime.

IVF Index. For very large datasets (>50M vectors) that are largely static, an IVF (Inverted File) index provides better memory efficiency than HNSW. The key parameter is nlist (number of centroids — typically sqrt(N) where N is vector count) and nprobe at query time (number of centroids to search — higher nprobe improves recall at cost of latency). IVF requires a full rebuild to add new vectors, making it unsuitable for high-update-frequency workloads.

4.3 Backup and Restore

Vector databases require purpose-built backup strategies. Standard filesystem snapshots work for self-hosted deployments. Managed services provide snapshot APIs that must be scheduled and validated.

Snapshot schedule: Hourly snapshots during business hours, 4-hourly off-peak, retained for 7 days. Weekly snapshots retained for 3 months. Monthly snapshots retained for 1 year (or per regulatory requirement).

Backup validation: Every backup must be validated by restoring to a staging environment and executing the golden query set. A backup that cannot serve golden queries is not a valid backup. Validation frequency: weekly for production-tier backups. An unvalidated backup is not counted toward RPO compliance.

Cross-region replication: For production deployments with a DR requirement, configure asynchronous replication to a secondary region. Replication lag is monitored and included in RPO calculations. Failover procedure is documented, tested quarterly, and requires no manual steps to execute (automated DNS failover).

4.4 Multi-Tenancy and Namespace Isolation

Three multi-tenancy models exist, each with different security properties:

Per-tenant collections (strongest isolation): each tenant has a dedicated collection or index. No cross-tenant query is physically possible. Higher operational overhead (N collections to manage). Preferred for multi-tenant SaaS products where tenants are different legal entities.

Namespace isolation (intermediate): shared collection with namespace partitioning. Queries are scoped to the tenant's namespace. Relies on correct namespace filtering in every query — a missing filter parameter exposes all namespaces. Pinecone's namespaces and Qdrant's collections provide this model. Suitable for internal business units where exposure risk is lower.

Shared collection with metadata filtering (weakest isolation): all vectors in one collection, filtered by a tenant_id metadata field. Relies entirely on the application layer including the correct filter in every query. Not recommended for security-sensitive multi-tenancy — a single missing filter in application code exposes all tenants. Acceptable only for intra-team isolation without security classification separation.

API key scoping: each tenant receives an API key scoped to their collection/namespace. The vector database API gateway validates that the API key is authorised for the collection/namespace in the request. A misconfigured API key cannot access another tenant's data even if the application sends a cross-tenant query.

4.5 Performance Monitoring

The recall measurement on a golden query set is the most important metric in the entire monitoring framework. The golden query set consists of 100–500 manually curated (question, expected_top_k_document_ids) pairs covering all major query categories. The recall metric at k (e.g., recall@5) measures whether the correct document appears in the top-5 retrieved results. This measurement must be automated and run at least daily.

5. Architecture Diagram

ARCHITECTURE DIAGRAM

flowchart TD subgraph Ingestion["Ingestion Layer"] A[Vector Upsert Pipeline] B{Index Type} end subgraph Store["Vector Store"] C[HNSW Index] D[Namespace Isolation] E[(Snapshot Store)] end subgraph Operations["Operations Layer"] F[Defrag Scheduler] G[Backup Validator] H[Performance Monitor] end A --> B B -->|dynamic data| C B -->|large static| C C --> D C --> F F -->|recall drop| C C --> E E --> G C --> H H --> I[Alert Manager] style A fill:#dbeafe,stroke:#3b82f6 style B fill:#f3e8ff,stroke:#a855f7 style C fill:#fef9c3,stroke:#eab308 style D fill:#fef9c3,stroke:#eab308 style E fill:#fef9c3,stroke:#eab308 style F fill:#f0fdf4,stroke:#22c55e style G fill:#f0fdf4,stroke:#22c55e style H fill:#f0fdf4,stroke:#22c55e style I fill:#fee2e2,stroke:#ef4444

6. Components

Component	Type	Responsibility	Technology Options	Criticality
Vector Database Engine	Storage/Retrieval	Store vectors and metadata; serve ANN queries with configurable recall/latency trade-off	Pinecone, Weaviate, Qdrant, Milvus, pgvector, Amazon OpenSearch, Azure AI Search	Critical
HNSW Index	Index	Approximate nearest neighbour search for dynamic datasets	Native HNSW in all major vector DBs; parameter tuning per §4.2	Critical
IVF Index	Index	ANN for large static datasets	Faiss IVF (Milvus, pgvector with ivfflat), HNSW alternatives at extreme scale	High
Multi-tenancy Enforcer	Security	Enforce namespace/collection isolation per tenant	Pinecone namespaces, Qdrant collections, Weaviate multi-tenancy, custom API gateway	High
Tenant API Key Manager	Security	Issue and validate tenant-scoped API keys	HashiCorp Vault dynamic secrets, custom API key store, native DB key management	High
Snapshot Scheduler	Operations	Schedule and execute periodic index snapshots	Kubernetes CronJob, AWS Lambda + EventBridge, Airflow	High
Backup Validator	Operations	Restore snapshot to staging and validate with golden query set	Custom validation job	High
Cross-Region Replica	DR	Asynchronous replica in secondary region	Managed service replication (Pinecone, OpenSearch), custom replication pipeline	Medium
Golden Query Set Evaluator	Monitoring	Daily recall measurement against curated question set	Custom Python evaluation job; Ragas framework	High
Performance Monitoring Stack	Observability	Query latency, index size, memory usage, recall trend	Prometheus + Grafana; Datadog; cloud-native monitoring	High
Index Defragmentation Job	Maintenance	Periodic HNSW rebuild or segment merge to restore recall	Platform-specific rebuild API; blue/green promotion logic	Medium

7. Data Flow

7.1 Primary Data Flow — Vector Ingestion and Query

Step	Actor	Action	Output
1	Embedding Pipeline	Generates vector embeddings for new or updated documents	Embedding vectors with document IDs and metadata
2	Upsert Pipeline	Batches vectors; calls vector DB upsert API per tenant namespace	Vectors written to index with metadata
3	Index Engine	Integrates new vectors into HNSW graph or IVF centroids	Updated index; fragmentation score increases incrementally
4	AI Application	Issues ANN query: query vector + k + metadata filters + tenant namespace	Query request
5	Multi-tenancy Enforcer	Validates tenant API key; scopes query to tenant namespace	Authorised scoped query
6	Index Engine	Executes ANN search with specified ef parameter; applies metadata filter	Top-k results with similarity scores and document IDs
7	AI Application	Receives results; fetches document content from document store using IDs	Retrieved context for LLM
8	Monitoring	Records query latency, result count, namespace ID	Latency and usage metrics

7.2 Error Flow

Error	Detection	Recovery	Escalation
Cross-tenant namespace access attempt	API gateway rejects request; unauthorised namespace in request	Return 403; log security event with tenant ID and requested namespace	P1 security incident; investigate for key compromise
Query latency SLO breach	p99 latency alert	Check index fragmentation; run recall measurement; scale read replicas if fragmentation not the cause	Engineering on-call; capacity review
Recall drop below threshold	Daily golden set evaluation	Trigger HNSW rebuild in staging; validate recall; blue/green promote to production	Alert AI platform team; investigate root cause (fragmentation, new data distribution, or model drift)
Snapshot failure	Snapshot job error log	Retry; alert if 3 consecutive failures	P2 incident; DBA review; RPO compliance risk
Index write failure (upsert error)	Upsert API exception; dead letter queue	Retry with exponential backoff; dead letter queue after max retries	Alert ingestion pipeline team; document store is source of truth for re-index

8. Security Considerations

8.1 Authentication and Authorisation

All vector database API calls require authentication. Service accounts (application-to-database) use API keys stored in the secrets vault with 90-day rotation. For managed services, IAM-based authentication (AWS IAM for OpenSearch, Azure AD for AI Search) is preferred over static API keys. Human access to the vector database (admin operations) requires MFA-enabled SSO with role separation: Read-Only, Operator, Administrator.

8.2 Secrets Management

Vector database API keys and connection strings are stored in HashiCorp Vault or cloud-native equivalents. Dynamic secrets are used where supported (Vault database secrets engine generates short-lived credentials per connection). All secrets are rotated without application restart capability — applications use the secrets SDK to fetch fresh credentials on rotation.

8.3 Data Classification

Each tenant namespace or collection carries the data classification of the content within it. The API gateway enforces that applications calling with a classification-scoped token cannot query collections above their classification level. For multi-classification corpora (e.g., Internal and Confidential content in the same index), per-vector metadata classification filtering is enforced — but this is a weaker control than physical namespace separation and is only used when namespace separation is impractical.

8.4 Encryption

All vector data is encrypted at rest using AES-256 with customer-managed keys where supported. Data in transit uses TLS 1.3. Backup snapshots are encrypted with the same CMK as the live index. For self-hosted deployments, disk encryption is mandatory; application-layer encryption of vector payloads is an additional control for Restricted classification data.

8.5 Auditability

All query events are logged: tenant ID, namespace, query vector hash (not the vector itself), result count, latency, and timestamp. Administrative operations (index rebuild, namespace creation/deletion, key rotation) are logged with actor identity. Audit logs are immutable and retained per regulatory requirements.

8.6 OWASP LLM Top 10 Mapping

OWASP LLM Risk	Relevance	Mitigation
LLM01 Prompt Injection	Vectors encoding adversarial content could influence RAG retrieval to inject instructions	Content safety filter on retrieved content before LLM inclusion; monitor for unusual retrieval patterns
LLM02 Insecure Output Handling	Metadata stored with vectors could contain injection payloads	Sanitise all metadata fields before including in LLM context
LLM03 Training Data Poisoning	Adversarial vectors inserted into the index can manipulate retrieval results	Authenticated upsert pipeline only; anomaly detection on new vector distributions
LLM04 Model Denial of Service	High-dimensional queries with large ef values could exhaust compute	Per-query ef cap; rate limiting per tenant; query timeout enforcement
LLM05 Supply Chain Vulnerabilities	Vector database client library vulnerabilities	Dependency scanning in CI/CD; library version pinning; security patch SLA
LLM06 Sensitive Information Disclosure	Cross-tenant vector leakage	Per-tenant collections (strongest); namespace isolation with key scoping; regular penetration testing of isolation boundaries
LLM07 Insecure Plugin Design	Vector DB exposed as a plugin to an AI agent	Read-only plugin interface; no upsert/delete from agent context; namespace-scoped plugin credentials
LLM08 Excessive Agency	AI agent with vector DB write access could insert malicious vectors	Agent access is read-only by default; upsert capability requires explicit elevated permission
LLM09 Overreliance	Applications assume retrieval is always accurate; no recall monitoring	Recall@K monitoring; retrieval confidence scoring surfaced to AI application
LLM10 Model Theft	Vector index encodes semantic properties of proprietary content	Encrypted index; no bulk export API exposed externally; rate limiting prevents full index reconstruction via queries

9. Governance Considerations

9.1 Responsible AI

The vector database is the retrieval mechanism, not a model itself, but its configuration choices affect which content AI applications can access. Index parameter settings that prioritise recall over latency ensure more relevant content is retrieved — this is a positive bias for answer quality but must be balanced against cost. Namespace configurations that are too coarse-grained may allow retrieval of content from one business domain to influence answers in another (unwanted cross-domain context).

9.2 Model Risk Management

The embedding model whose outputs populate the vector index is a model risk management artefact. Its performance characteristics (semantic clustering properties, language coverage, domain tuning) directly affect retrieval quality. Embedding model upgrades require a full re-indexing event — this is a planned change with a pre-production recall validation gate before production promotion.

9.3 Human Approval Gates

Index rebuild operations that affect production retrieval quality are gated on human approval after staging validation. Embedding model upgrades require sign-off from the AI platform team and corpus governance function before production cutover. Namespace/collection provisioning for new tenants or new data classifications requires security team approval.

9.4 Policy Ownership

Vector database configuration policies (index parameters, backup schedules, namespace isolation model) are owned by the AI Platform Engineering team. Tenant provisioning policies (who can have a namespace, what data classification is permitted) are owned by the Data Governance and Security teams. Backup and DR policies are owned by the Infrastructure and Business Continuity teams.

9.5 Traceability

Each query to the vector database is logged with the query vector hash, the tenant namespace, and the result document IDs. The RAG application correlates these with the AI response. Combined with corpus versioning (EAAPL-KNW003), this creates a complete traceability chain from AI answer to retrieved documents to vector index state to source documents.

9.6 Governance Artefacts

Artefact	Owner	Frequency	Location
Vector DB configuration specification	AI Platform Engineering	Per change	IaC repository
Namespace/tenant provisioning register	Security + Data Governance	Updated per tenant onboarding	Access management system
Embedding model card	ML Engineering	Per model version	ML model registry
Index performance baseline report	AI Platform Engineering	Quarterly + per rebuild	Monitoring platform
Backup validation log	Operations	Weekly	Runbook and incident management system
Golden query set	AI Platform Engineering + Domain SMEs	Quarterly refresh	Test suite repository

10. Operational Considerations

10.1 Monitoring and SLOs

Metric	SLO Target	Alerting Threshold	Tool
Query latency p50	≤50ms	>100ms over 5 min	Prometheus + Grafana
Query latency p99	≤500ms	>1,000ms over 5 min	Prometheus + Grafana
Recall@5 on golden query set	≥0.90	<0.85	Daily evaluation job
Index availability	99.9%	<99.5% over 1-hour window	Cloud provider health check
Memory utilisation	<80% of allocated	>90%	Infrastructure metrics
Upsert throughput (during ingestion)	Meets corpus management SLO	Lag >30 min for critical ingestion	Ingestion pipeline metrics
Snapshot success rate	100% per schedule	Any scheduled snapshot failure	Job monitoring alert

10.2 Logging

All API calls (query and upsert) are logged with structured JSON including: timestamp, tenant_id, namespace, operation, latency_ms, result_count, error_code (if applicable). Administrative operations include actor identity. Query vector content is not logged (privacy); vector hash (SHA-256) is logged for deduplication analysis. Log volume scales with query volume — plan for log retention storage accordingly.

10.3 Incident Management

P1: Vector database unavailable or recall drops below 0.70 — immediate on-call escalation, 15-minute response SLA. P2: Latency SLO breach or recall between 0.70–0.85 — 1-hour response. P3: Individual namespace issue, non-critical backup failure — next business day. Post-mortems required for all P1/P2 incidents involving recall degradation, as these directly affect AI application quality.

10.4 Disaster Recovery

Scenario	RTO	RPO	Recovery Procedure
Single node failure (self-hosted cluster)	2 min (automatic failover to replica)	0 (synchronous replica)	Automatic; validate with health check query
Full cluster loss (self-hosted)	2 hours	1 hour (last snapshot)	Restore from snapshot; validate with golden query set; update application endpoint
Managed service outage (regional)	4 hours	1 hour (cross-region replica)	Promote DR region replica; update DNS; validate golden query set
Index corruption (silent)	4 hours (detection + restore)	Last validated snapshot	Restore from last validated snapshot; re-run any upserts since snapshot from dead letter queue
Embedding model unavailability (query time)	N/A (queries use stored vectors)	N/A	Query-time embeddings from a fallback embedding model; upsert pipeline blocked until primary model restored

10.5 Capacity Planning

HNSW memory consumption: approximately (4 × vector_dimensions + 8 × M) × num_vectors bytes. For 1536-dimension OpenAI embeddings with M=16 and 10M vectors: ~105 GB RAM. Plan for 2× headroom during index rebuild (old and new index concurrent). Storage (excluding in-memory index): ~6KB per vector with metadata. SSD-backed storage for hot indices; HDD acceptable for backup snapshots.

11. Cost Considerations

11.1 Cost Drivers

Cost Driver	Description	Typical Range
Vector DB hosting (managed)	Pinecone/Weaviate Cloud/Amazon OpenSearch service cost	$500–$20,000/month depending on pod size and query volume
Vector DB hosting (self-managed)	GPU/CPU server for in-memory index; depends on dataset size	$2,000–$30,000/month for large deployments
Embedding generation (ingestion)	Per-token cost for generating vectors at ingestion	$0.0001–$0.001 per 1,000 tokens
Query-time embedding generation	Per-query embedding generation cost	$0.00002–$0.0002 per query
Cross-region replication transfer	Data transfer cost for async replication	Typically <5% of primary hosting cost
Backup storage	Snapshot storage per retention schedule	$0.02–$0.05 per GB per month

11.2 Scaling Risks

Memory is the binding constraint for HNSW in-memory indexes: dataset size doubling requires proportional memory increase; no cheap way to trade memory for recall beyond index type changes
Query cost for managed services (especially serverless models) can spike unexpectedly during high-traffic periods — implement per-tenant rate limits to prevent runaway cost
Embedding model upgrades require re-embedding the entire corpus: a 10M-document corpus at $0.0001 per 1K tokens costs $500–$5,000 per re-embedding event

11.3 Optimisations

Quantisation (product quantisation or scalar quantisation) reduces memory consumption by 4–16× at a modest recall cost — acceptable for most use cases above 10M vectors
Query caching: identical query vectors produce identical results; cache at the application layer with TTL aligned to update frequency
Smaller embedding models (384-dimension vs 1536-dimension) reduce memory and cost by 4× with moderate recall impact — evaluate against golden query set before committing
Reserved capacity / committed use discounts for managed services (1-year commit typically saves 30–40%)

11.4 Indicative Cost Ranges

Deployment Scale	Monthly Infrastructure Cost	Notes
Small (1M vectors, single tenant)	$500–$2,000	Managed service, standard tier
Medium (50M vectors, 10 tenants)	$5,000–$20,000	Managed service or self-hosted with HA
Large (500M+ vectors, enterprise-wide)	$30,000–$150,000	Self-hosted with dedicated infrastructure; quantisation required

12. Trade-Off Analysis

12.1 Vector Database Platform Comparison

Platform	Latency	Max Scale	Multi-tenancy	Managed Option	Filtering	Best For
Pinecone	Very low	Very high	Namespaces	Yes (managed only)	Good	High-query-volume managed SaaS; time-to-value priority
Weaviate	Low	High	Collections	Yes (Cloud + self-hosted)	Excellent (GraphQL)	Complex filtering; hybrid vector+keyword search
Qdrant	Very low	High	Collections	Yes (Cloud + self-hosted)	Excellent	Performance-critical self-hosted; rich filtering
pgvector	Medium	Medium (<50M)	Schema/RLS	Via RDS/AlloyDB	Excellent (SQL)	PostgreSQL-native; small-medium datasets; existing PG estate
Amazon OpenSearch	Medium	Very high	Index-level	Yes	Good	AWS-native; also full-text search required
Azure AI Search	Medium	Very high	Index-level	Yes	Excellent	Azure-native; hybrid semantic + BM25 search

12.2 Architectural Tensions

Tension	Option A	Option B	Recommended Resolution
Recall vs. latency	High ef parameter (better recall, higher latency)	Low ef parameter (faster queries, lower recall)	Tune to the query latency SLO; measure recall@k on golden set; recall is the binding quality constraint
Managed vs. self-hosted	Managed service (less control, lower ops burden)	Self-hosted (full control, higher ops burden)	Default to managed unless team has proven vector DB ops capability or dataset scale exceeds managed service limits
Per-tenant collection vs. namespace	Per-tenant collections (strongest isolation, higher ops overhead)	Shared namespaces (simpler, weaker isolation)	Per-tenant collections for external/multi-organisation tenants; namespaces for internal organisational separation
HNSW vs. IVF	HNSW (better for updates, higher memory)	IVF (lower memory, batch-rebuild only)	HNSW for most production workloads; IVF for very large static datasets (>50M vectors, infrequent updates)

13. Failure Modes

Failure	Likelihood	Impact	Detection	Recovery
Silent recall degradation (HNSW fragmentation)	High over 6–12 months without maintenance	High — AI answer quality degrades invisibly	Daily recall@k monitoring on golden set	Scheduled HNSW rebuild; blue/green promotion
Cross-tenant data leakage	Low (if namespaces used correctly)	Critical — privacy and regulatory breach	Penetration test; query log cross-namespace analysis	Immediate namespace isolation review; security incident response
Memory exhaustion under load	Medium (if capacity planning not followed)	High — queries fail or slow to unacceptable levels	Memory utilisation alert >90%	Scale vertically (more RAM) or apply quantisation; immediate: shed load
Embedding model API rate limit during bulk ingestion	Medium (bulk loads)	Medium — ingestion pipeline stalls	Upsert queue depth metric	Reduce ingestion batch concurrency; spread ingestion across time window
Backup restore failure (never tested)	High (if no test programme)	Critical — no recovery path in DR scenario	DR test drill failure	Implement weekly validated restore to staging; treat untested backup as no backup

13.1 Cascading Failure Scenarios

Scenario 1: Memory Exhaustion During Index Rebuild. A scheduled HNSW rebuild is triggered while a simultaneous bulk ingestion event occurs. The rebuilding index and new vectors require 2× normal memory. The host runs out of memory; the vector database crashes. The old index is no longer in memory; the new index is not complete. RAG queries fail for all tenants. Recovery: restart with the last stable snapshot; queue the bulk ingestion for after the rebuild window; implement memory guard on rebuild (block during rebuild if memory >70% utilised).

Scenario 2: Embedding Model Deprecation Without Warning. The embedding model provider deprecates the model with 2 weeks notice. The corpus (10M documents) requires full re-embedding. The re-embedding process takes 5 days. During this period, new documents cannot be ingested (they would produce incompatible vectors). The AI application query quality degrades as new content (which should update existing document vectors) is not indexed. Lesson: maintain a secondary embedding model fallback; monitor provider deprecation notices; plan re-embedding as a scheduled quarterly option even without external forcing.

14. Regulatory Considerations

Regulation	Relevant Clause	Requirement	How Vector DB Management Addresses It
APRA CPS 230	§36–§38 (Operational Continuity)	Material systems have documented and tested recovery procedures	Backup schedule, validated restore procedure, RTO/RPO targets documented and tested quarterly
APRA CPS 234	§16 (Access Controls)	Access to information assets controlled based on need	Tenant API keys scoped to specific namespaces/collections; MFA for admin access
Australian Privacy Act 1988	APP 11 (Security)	Reasonable steps to protect personal information	Encryption at rest and in transit; namespace isolation prevents cross-tenant PII exposure; audit log of all access
EU AI Act	Article 10 (Data Governance)	Data used in AI systems subject to governance	Namespace-level data classification; governed ingestion pipeline; provenance metadata on all vectors
EU GDPR	Article 17 (Right to Erasure)	Ability to delete individual's data on request	Per-document vector deletion by ID; namespace deletion for bulk erasure; snapshot management for historical data
ISO/IEC 42001	§8.3 (AI System Operations)	Operational monitoring and management of AI systems	Recall monitoring, latency SLOs, backup validation, and incident management documented
NIST AI RMF	MANAGE 2.2 (AI Risk Response)	Implement risk response plans for identified AI risks	HNSW defragmentation schedule, DR procedures, and embedding model contingency plan address identified vector DB risks

15. Reference Implementations

15.1 AWS

Component	AWS Service
Vector database	Amazon OpenSearch Service with vector engine (k-NN)
Embedding generation	Amazon Bedrock Titan Embeddings
Backup	OpenSearch automated snapshots to S3
Cross-region DR	OpenSearch cross-cluster replication
Secrets management	AWS Secrets Manager
Monitoring	Amazon CloudWatch + Managed Grafana
Multi-tenancy	Index-per-tenant with IAM fine-grained access

15.2 Azure

Component	Azure Service
Vector database	Azure AI Search with vector search
Embedding generation	Azure OpenAI Embeddings
Backup	Azure AI Search index backup (preview) + custom export to Blob Storage
Cross-region DR	Zone-redundant + paired region deployment
Secrets management	Azure Key Vault
Monitoring	Azure Monitor + Grafana
Multi-tenancy	Index-per-tenant with Azure AD RBAC

15.3 GCP

Component	GCP Service
Vector database	Vertex AI Vector Search (Matching Engine)
Embedding generation	Vertex AI Embeddings
Backup	Vertex AI index export to Cloud Storage
Secrets management	Google Cloud Secret Manager
Monitoring	Cloud Monitoring + Grafana

15.4 On-Premises

Component	Technology
Vector database	Qdrant self-hosted (Kubernetes) or Milvus cluster
Embedding generation	Sentence Transformers on GPU servers; Ollama for local models
Backup	Custom snapshot scripts + MinIO S3-compatible storage
DR	Qdrant distributed mode with cross-datacenter replication
Secrets management	HashiCorp Vault
Monitoring	Prometheus + Grafana

Pattern ID	Pattern Name	Relationship Type	Notes
EAAPL-KNW003	AI Knowledge Corpus Management	Upstream	Corpus management governs content; vector DB management governs the storage and retrieval infrastructure
EAAPL-KNW001	Enterprise Knowledge Graph	Complementary	EKG provides structured traversal; vector DB provides semantic similarity retrieval — hybrid RAG uses both
EAAPL-KNW006	Corpus Quality Assurance	Supporting	Quality-gated documents are stored in the vector DB; recall monitoring in vector DB surfaces corpus quality issues
EAAPL-RAG001	Retrieval Augmented Generation	Consumer	The RAG pattern is the primary consumer of the vector database's query API
EAAPL-INF002	AI Infrastructure High Availability	Parent	Vector DB HA configuration is an application of the broader AI infrastructure HA pattern
EAAPL-SEC003	Multi-Tenant AI Data Isolation	Specialisation	This pattern covers vector DB-specific isolation; KNW004 applies it to the vector DB context

17. Maturity Assessment

Overall Maturity Label: Proven

Dimension	Score (1–5)	Rationale
Technology readiness	5	Multiple production-proven managed and self-hosted vector databases at enterprise scale; mature tooling ecosystem
Organisational capability	3	Requires specific vector DB operational knowledge not yet widely available; most teams learn from incidents
Standards availability	2	No standardised vector database query language (GQL is emerging); backup/restore APIs are vendor-specific
Vendor ecosystem	5	Rich and competitive ecosystem; all major cloud providers have native offerings; multiple OSS alternatives
Case evidence	5	Extensively deployed across financial services, technology, and media at very large scale
Regulatory alignment	3	Data governance controls applicable; specific vector DB audit requirements not yet standardised by regulators
Overall	3.8 / 5	Proven technology with strong vendor ecosystem; primary gaps are operational knowledge and lack of cross-vendor standards

18. Revision History

Version	Date	Author	Changes
1.0	2026-06-12	EAAPL Editorial Board	Initial publication — covers selection framework, HNSW/IVF tuning, backup and DR, multi-tenancy isolation models, performance monitoring including recall@k

Track this pattern for APRA/ASIC review

← Back to Library More Knowledge Management →

EAAPL-KNW004: Vector Database Management

EAAPL-KNW004: Vector Database Management

1. Executive Summary

2. Problem Statement

2.1 Business Problem

2.2 Technical Problem

2.3 Symptoms

2.4 Cost of Inaction

3. Context

3.1 When to Apply

3.2 When NOT to Apply

3.3 Prerequisites

3.4 Industry Applicability

4. Architecture Overview

4.1 Selection Framework

4.2 Index Management

4.3 Backup and Restore

4.4 Multi-Tenancy and Namespace Isolation

4.5 Performance Monitoring

5. Architecture Diagram

6. Components

7. Data Flow

7.1 Primary Data Flow — Vector Ingestion and Query

7.2 Error Flow

8. Security Considerations

8.1 Authentication and Authorisation

8.2 Secrets Management

8.3 Data Classification

8.4 Encryption

8.5 Auditability

8.6 OWASP LLM Top 10 Mapping

9. Governance Considerations

9.1 Responsible AI

9.2 Model Risk Management

9.3 Human Approval Gates

9.4 Policy Ownership

9.5 Traceability

9.6 Governance Artefacts

10. Operational Considerations

10.1 Monitoring and SLOs

10.2 Logging

10.3 Incident Management

10.4 Disaster Recovery

10.5 Capacity Planning

11. Cost Considerations

11.1 Cost Drivers

11.2 Scaling Risks

11.3 Optimisations

11.4 Indicative Cost Ranges

12. Trade-Off Analysis

12.1 Vector Database Platform Comparison

12.2 Architectural Tensions

13. Failure Modes

13.1 Cascading Failure Scenarios

14. Regulatory Considerations

15. Reference Implementations

15.1 AWS

15.2 Azure

15.3 GCP

15.4 On-Premises

16. Related Patterns

17. Maturity Assessment

18. Revision History