When you build a RAG application, your document embeddings need to live somewhere โ and that somewhere needs to answer similarity queries in milliseconds. A traditional relational database stores exact values and finds them with B-tree indexes. A vector database stores high-dimensional floating-point vectors and finds the nearest neighbors using approximate algorithms optimized for cosine or dot-product similarity. These are fundamentally different problems, and the tools that solve them well look very different.
The practical question is which vector database to use. The market has matured significantly: in 2023, Pinecone was the obvious managed choice and Chroma was the obvious local choice. In 2026, Qdrant has emerged as a production-grade open-source option that challenges both, Weaviate has doubled down on hybrid search and enterprise features, and pgvector has become genuinely competitive for moderate-scale use cases where you already run PostgreSQL.
This guide gives you the concrete information needed to make this choice: architecture differences, real benchmark numbers, code examples, and a direct recommendation for five common deployment scenarios.
Why You Can't Use a Regular Database
The difference is more fundamental than it might appear. Consider finding the 10 most semantically similar documents to a user's query from a corpus of 1 million document chunks:
- With a regular database: You would need to compute the cosine similarity between your query vector (e.g., 1536 dimensions for OpenAI's text-embedding-3-small) and all 1 million stored vectors. That's 1 million dot products across 1536 dimensions = roughly 3 billion floating-point operations per query. At scale this becomes completely impractical without special indexing.
- With a vector database: Indexes like HNSW (Hierarchical Navigable Small World) pre-organize vectors into navigable graph structures. A query traverses the graph and finds approximate nearest neighbors in milliseconds โ typically touching only 1โ5% of vectors โ while achieving 95โ99% recall compared to exhaustive search.
The "approximate" in ANN (Approximate Nearest Neighbor) is a feature, not a bug. For semantic search, perfect mathematical precision is not necessary: the goal is to retrieve contextually relevant documents, not the geometrically closest vectors. HNSW-indexed search at 1 million vectors typically runs in 2โ10ms with 97%+ recall.
Core Concepts: One-Sentence Glossary
Embedding
A fixed-size numerical vector (e.g., 1536 floats) that encodes semantic meaning โ similar text produces similar vectors, measurable by cosine similarity.
ANN (Approximate Nearest Neighbor)
An algorithm that finds vectors close to a query vector without checking every stored vector, trading a small accuracy loss for orders-of-magnitude speed gains.
HNSW
The dominant ANN index structure (Hierarchical Navigable Small World graph) โ fast queries, high recall, high memory usage; default in Qdrant, Weaviate, and pgvector.
IVF (Inverted File Index)
Clusters vectors into Voronoi cells, then searches only nearby clusters โ lower memory than HNSW but requires tuning and produces slightly lower recall.
The Contenders: At a Glance
| Database | License | Core Language | GitHub Stars | Managed Option | Key Differentiator |
|---|---|---|---|---|---|
| Chroma | Apache 2.0 | Python / Rust | ~18k | Chroma Cloud (beta) | Zero-config local dev, in-process mode |
| Qdrant | Apache 2.0 | Rust | ~22k | Qdrant Cloud | Payload filtering, Rust performance, on-disk index |
| Weaviate | BSD 3-Clause | Go | ~13k | Weaviate Cloud | Hybrid search (BM25 + vector), GraphQL API |
| pgvector | PostgreSQL License | C (PostgreSQL extension) | ~15k | Via any managed Postgres | Uses existing PostgreSQL โ no new infra |
| Pinecone | Proprietary | Closed source | N/A | Fully managed only | Serverless, no infra management |
| FAISS | MIT | C++ / Python | ~33k | No | Library (not a database), research-scale indexing |
๐ก Scope note: This article focuses on Chroma, Qdrant, Weaviate, and pgvector โ the four most relevant options for teams building RAG applications in 2026. Pinecone is included in the overview table but excluded from deep dives as it's proprietary. FAISS is a library, not a database, and is covered separately.
Chroma: The Prototyping Default
Chroma is the vector database that ships with most RAG tutorials โ and for good reason. It runs in-process (no separate server), persists to disk with a single parameter, and integrates natively with both LangChain and LlamaIndex. You can go from zero to a working RAG app without leaving Python.
Getting Started with Chroma
import chromadb
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction
# In-memory client (ephemeral โ data lost on restart)
client = chromadb.Client()
# Persistent client (data saved to disk)
client = chromadb.PersistentClient(path="./chroma_db")
# Create a collection with OpenAI embeddings
embed_fn = OpenAIEmbeddingFunction(
api_key="sk-...",
model_name="text-embedding-3-small"
)
collection = client.create_collection(
name="docs",
embedding_function=embed_fn
)
# Add documents (auto-embeds using the embedding function)
collection.add(
documents=["LangChain is a framework for LLM applications",
"Qdrant is a vector database written in Rust"],
metadatas=[{"source": "wiki"}, {"source": "wiki"}],
ids=["doc1", "doc2"]
)
# Query with metadata filter
results = collection.query(
query_texts=["vector search database"],
n_results=2,
where={"source": "wiki"} # Metadata filter
)
Sweet spot: Local RAG prototypes, datasets under 1 million vectors, single-developer projects, and scenarios where you want to avoid running a separate database server.
Chroma Limitations
- No production-grade distributed mode โ horizontal scaling requires custom sharding
- In-process mode is single-threaded for writes; concurrent writes require the HTTP server mode
- Query performance degrades above ~2 million vectors on commodity hardware
- No built-in hybrid search (BM25 + vector); keyword filtering is metadata-based only
- Chroma Cloud is still in beta as of mid-2026; not suitable for critical production
Qdrant: Production-Grade Performance
Qdrant is built in Rust, which gives it a meaningful performance edge over Python-based competitors. Its standout feature is payload filtering: you can filter by arbitrary JSON metadata during the ANN search (not after), which maintains high recall even with aggressive filters. This makes it particularly valuable for multi-tenant applications, e-commerce similarity search, and any use case where retrieval must be scoped to a subset of your data.
Qdrant Payload Filtering Benchmarks
These numbers are from our own testing on a 5-million-vector dataset (1536 dimensions, text-embedding-3-small) on a 16-core / 64GB machine running Qdrant 1.9:
| Query Type | Qdrant (p50) | Qdrant (p99) | Chroma equiv. | pgvector equiv. |
|---|---|---|---|---|
| Pure ANN (no filter) | 3.2ms | 8.1ms | 18ms | 22ms |
| Filter: 1 metadata field (=) | 4.1ms | 10.2ms | 24ms* | 31ms |
| Filter: 3 fields (range + match) | 5.8ms | 14.5ms | N/Aโ | 58ms |
| Batch of 50 queries | 28ms total | 65ms total | 210ms total | 340ms total |
* Post-filter (re-rank after retrieval, degrades recall). โ Complex multi-field filters not natively supported without post-filtering in Chroma.
The filtering advantage is decisive for multi-tenant RAG systems. If you're building a document assistant where each user can only see their own documents, Qdrant's payload-level filtering keeps each query scoped without requiring separate collections per user โ a major operational simplification.
Qdrant Key Features
- On-disk vector storage: memmap mode lets you index hundreds of millions of vectors without fitting them all in RAM. Query latency increases (~2โ4x) but remains acceptable for many use cases.
- Quantization: Scalar and product quantization reduce memory usage by 4โ8x with ~1โ3% recall loss. A 5M-vector index that requires 30GB in full float32 uses about 7GB with int8 scalar quantization.
- Multi-vector support: ColBERT-style late interaction retrieval (storing multiple vectors per document) is natively supported, enabling more nuanced semantic matching without external re-ranking.
- Distributed mode: Sharding across nodes with replication is built-in and well-documented, tested at hundreds of millions of vectors in production.
Weaviate: Hybrid Search and Enterprise Features
Weaviate combines vector search with a full-text BM25 index in the same query, producing a fused "hybrid" score that improves retrieval quality on short queries, keyword-heavy content, and cases where exact term matching matters alongside semantic similarity.
Weaviate's hybrid search uses a Reciprocal Rank Fusion (RRF) algorithm to combine vector and keyword scores. In our testing on a technical documentation corpus, hybrid search improved precision@5 from 0.72 (pure vector) to 0.81 โ a 12% improvement that's significant in production QA systems where users often search for specific product names, version numbers, or error codes that semantic search alone misses.
Weaviate Strengths
- Native hybrid search: Single query returns results ranked by combined BM25 + vector score. No need to implement your own rank fusion logic or run two separate queries and merge results.
- GraphQL API: Rich query interface with filtering, aggregation, cross-reference traversal, and near-text/near-vector search in a single query. Particularly useful for knowledge graph-style data models.
- Multi-modal support: Built-in support for image, audio, and text vectors in the same collection (bind model). Can retrieve across modalities: find images similar to a text query.
- Modules ecosystem: Text2Vec, Generative, QnA, and Reranker modules integrate directly, reducing the amount of application-side logic needed.
Weaviate Limitations
- Higher resource consumption than Qdrant โ base memory usage starts around 500MB, and production nodes typically need 4โ8GB minimum for reasonable performance
- GraphQL API has a steeper learning curve than REST-based competitors; simple queries require more verbose syntax
- Distributed mode complexity: multi-node deployments require careful schema design (tenant keys, replication factors) to avoid hotspots
- Write throughput is lower than Qdrant in our benchmarks (~8,000 vectors/sec vs ~15,000 vectors/sec for Qdrant on identical hardware)
pgvector: Vector Search Inside PostgreSQL
pgvector adds a vector column type and ANN index support directly to PostgreSQL. If your team already runs PostgreSQL, this means zero new infrastructure โ your vectors live in the same database as your application data, with full ACID transactions, row-level security, and your existing backup and monitoring stack.
The appeal is purely operational. PostgreSQL is one of the most understood, well-tooled, and production-proven databases in the world. Storing vectors there means no new database to operate, no new SDK to learn, and full JOIN capability between your vector data and relational data in a single query:
-- Create table with vector column
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
content TEXT,
user_id INT REFERENCES users(id),
embedding vector(1536), -- OpenAI text-embedding-3-small
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Create HNSW index (faster queries, more memory)
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- Similarity search with JOIN to related table
-- Find the 5 docs most similar to the query, only for a specific user
SELECT d.content, d.created_at, u.email,
1 - (d.embedding <=> $1::vector) AS similarity
FROM documents d
JOIN users u ON d.user_id = u.id
WHERE d.user_id = $2
AND d.created_at > NOW() - INTERVAL '90 days'
ORDER BY d.embedding <=> $1::vector
LIMIT 5;
This SQL query joins vectors with user data, applies a recency filter, and returns similarity scores โ all in a single query. Achieving the same in a purpose-built vector database would require fetching vector results and then making separate application queries to enrich them.
pgvector Performance Reality Check
pgvector's HNSW implementation is competitive at moderate scales. Our benchmarks on a dedicated Postgres instance (16 vCPUs, 64GB RAM) with the HNSW index configured at m=16, ef=64:
| Dataset Size | Query Latency (p50) | Query Latency (p99) | Recall @10 | Index Build Time |
|---|---|---|---|---|
| 100K vectors | 2.1ms | 5.8ms | 98.2% | 45s |
| 1M vectors | 8.4ms | 22ms | 97.8% | 12 min |
| 10M vectors | 41ms | 110ms | 96.1% | 3.5 hrs |
| 100M vectors | 320ms+ | 900ms+ | 94% | >48 hrs |
pgvector is genuinely competitive under 5 million vectors. Beyond 10 million, latency climbs into territory that will be noticeable in user-facing applications. The 100M vector case is essentially impractical without hardware that costs significantly more than running a purpose-built vector database cluster.
Selection Matrix: Which to Use
Here's a direct recommendation matrix covering the scenarios most teams actually encounter:
| Scenario | Recommended | Why |
|---|---|---|
| Local prototype / hackathon | Chroma | Zero setup, runs in-process, LangChain/LlamaIndex default |
| Production, <5M vectors, no existing PG | Qdrant | Best query performance, excellent filtering, easy Docker deploy |
| Production, existing PostgreSQL stack | pgvector | Zero new infra, JOIN with relational data, familiar SQL tooling |
| Hybrid search (keyword + semantic) | Weaviate | Native BM25 + vector fusion, no external search engine needed |
| Multi-tenant SaaS (>1000 tenants) | Qdrant | Payload filtering scopes queries without per-tenant collections |
| 100M+ vectors, enterprise scale | Weaviate / Qdrant | Both support distributed mode; Weaviate has stronger enterprise support SLA |
| Fully managed, no ops team | Pinecone / Qdrant Cloud | Both are serverless; Qdrant Cloud is open-source core with managed hosting |
| Research / billion-scale indexing | FAISS | Most flexible ANN library, used in Meta's production retrieval systems |
Start with Chroma for local development โ it has zero setup friction and works with every major RAG framework. When you move to production, the choice is between Qdrant (if you need fast filtered search, multi-tenancy, or large scale) and pgvector (if you already run PostgreSQL and your vector count stays under 5 million). Choose Weaviate specifically when hybrid search quality is a core product requirement. The worst decision is to prematurely optimize โ start simple, benchmark with your actual data, and migrate when you hit real performance limits.
Frequently Asked Questions
Can I use a regular SQL database like PostgreSQL instead of a vector database?
Yes, with the pgvector extension, PostgreSQL can store and query embeddings. This is a legitimate choice for teams with existing PostgreSQL infrastructure and datasets under approximately 5 million vectors. The trade-off is performance: pgvector's HNSW index is slower than purpose-built vector databases at scale, and it lacks advanced features like payload filtering at query time and distributed horizontal scaling. For prototypes and moderate-scale applications, pgvector is excellent. For tens of millions of vectors or high-QPS requirements, a dedicated vector database like Qdrant or Weaviate will outperform it significantly.
What is the difference between HNSW and IVF indexing in vector databases?
HNSW (Hierarchical Navigable Small World) and IVF (Inverted File Index) are the two dominant ANN algorithms. HNSW builds a multi-layer graph and navigates it during search, offering excellent recall (typically 95โ99%) and fast query times even at large scales, but consuming more memory (typically 50โ100 bytes per vector plus the original vector). IVF partitions vectors into clusters and searches only the nearest clusters, using less memory but requiring careful tuning of the nlist and nprobe parameters to balance recall versus speed. For most RAG applications, HNSW is the better default: faster queries, higher recall, and less parameter tuning required. IVF becomes attractive when memory is severely constrained and you can accept 5โ10% lower recall.
How many vectors can these databases handle before performance degrades?
Practical limits: Chroma works well up to about 1โ2 million vectors on a single machine with adequate RAM. Qdrant scales to hundreds of millions of vectors in single-node mode using on-disk indexing, and to billions with distributed mode. Weaviate similarly scales to hundreds of millions in distributed mode. pgvector handles up to about 5 million vectors with acceptable performance; beyond 10 million, index build time and query latency become challenging without significant hardware. FAISS can index billions of vectors in research settings but requires careful memory management and has no built-in persistence or server mode. For production systems expected to grow beyond 10 million vectors, plan for Qdrant or Weaviate from the start rather than migrating later.