Vector Databases Compared 2026: Chroma vs Qdrant vs Weaviate vs pgvector

Q: How many vectors can these databases handle before performance degrades?

Rough practical limits: Chroma works well up to about 1-2 million vectors on a single machine with adequate RAM; beyond that, query latency starts climbing noticeably. Qdrant scales to hundreds of millions of vectors in single-node mode (using its on-disk indexing) and to billions with distributed mode. Weaviate similarly scales to hundreds of millions in distributed mode. pgvector handles up to about 10 million vectors with acceptable performance; beyond that, index build time and query latency become problematic without significant hardware investment. FAISS can index billions of vectors in research settings but requires careful memory management. For production systems expected to grow beyond 10 million vectors, plan for Qdrant or Weaviate from the start.

When you build a RAG application, your document embeddings need to live somewhere — and that somewhere needs to answer similarity queries in milliseconds. A traditional relational database stores exact values and finds them with B-tree indexes. A vector database stores high-dimensional floating-point vectors and finds the nearest neighbors using approximate algorithms optimized for cosine or dot-product similarity. These are fundamentally different problems, and the tools that solve them well look very different.

The practical question is which vector database to use. The market has matured significantly: in 2023, Pinecone was the obvious managed choice and Chroma was the obvious local choice. In 2026, Qdrant has emerged as a production-grade open-source option that challenges both, Weaviate has doubled down on hybrid search and enterprise features, and pgvector has become genuinely competitive for moderate-scale use cases where you already run PostgreSQL.

This guide gives you the concrete information needed to make this choice: architecture differences, real benchmark numbers, code examples, and a direct recommendation for five common deployment scenarios.

Why You Can't Use a Regular Database

The difference is more fundamental than it might appear. Consider finding the 10 most semantically similar documents to a user's query from a corpus of 1 million document chunks:

With a regular database: You would need to compute the cosine similarity between your query vector (e.g., 1536 dimensions for OpenAI's text-embedding-3-small) and all 1 million stored vectors. That's 1 million dot products across 1536 dimensions = roughly 3 billion floating-point operations per query. At scale this becomes completely impractical without special indexing.
With a vector database: Indexes like HNSW (Hierarchical Navigable Small World) pre-organize vectors into navigable graph structures. A query traverses the graph and finds approximate nearest neighbors in milliseconds — typically touching only 1–5% of vectors — while achieving 95–99% recall compared to exhaustive search.

The "approximate" in ANN (Approximate Nearest Neighbor) is a feature, not a bug. For semantic search, perfect mathematical precision is not necessary: the goal is to retrieve contextually relevant documents, not the geometrically closest vectors. HNSW-indexed search at 1 million vectors typically runs in 2–10ms with 97%+ recall.

Core Concepts: One-Sentence Glossary

Embedding

A fixed-size numerical vector (e.g., 1536 floats) that encodes semantic meaning — similar text produces similar vectors, measurable by cosine similarity.

ANN (Approximate Nearest Neighbor)

An algorithm that finds vectors close to a query vector without checking every stored vector, trading a small accuracy loss for orders-of-magnitude speed gains.

HNSW

The dominant ANN index structure (Hierarchical Navigable Small World graph) — fast queries, high recall, high memory usage; default in Qdrant, Weaviate, and pgvector.

IVF (Inverted File Index)

Clusters vectors into Voronoi cells, then searches only nearby clusters — lower memory than HNSW but requires tuning and produces slightly lower recall.

The Contenders: At a Glance

Database	License	Core Language	GitHub Stars	Managed Option	Key Differentiator
Chroma	Apache 2.0	Python / Rust	~18k	Chroma Cloud (beta)	Zero-config local dev, in-process mode
Qdrant	Apache 2.0	Rust	~22k	Qdrant Cloud	Payload filtering, Rust performance, on-disk index
Weaviate	BSD 3-Clause	Go	~13k	Weaviate Cloud	Hybrid search (BM25 + vector), GraphQL API
pgvector	PostgreSQL License	C (PostgreSQL extension)	~15k	Via any managed Postgres	Uses existing PostgreSQL — no new infra
Pinecone	Proprietary	Closed source	N/A	Fully managed only	Serverless, no infra management
FAISS	MIT	C++ / Python	~33k	No	Library (not a database), research-scale indexing

💡 Scope note: This article focuses on Chroma, Qdrant, Weaviate, and pgvector — the four most relevant options for teams building RAG applications in 2026. Pinecone is included in the overview table but excluded from deep dives as it's proprietary. FAISS is a library, not a database, and is covered separately.

Chroma: The Prototyping Default

🎨 ChromaDB Best for Local Development

Chroma is the vector database that ships with most RAG tutorials — and for good reason. It runs in-process (no separate server), persists to disk with a single parameter, and integrates natively with both LangChain and LlamaIndex. You can go from zero to a working RAG app without leaving Python.

Getting Started with Chroma

        # pip install chromadb

        import chromadb

        from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction

        # In-memory client (ephemeral — data lost on restart)

        client = chromadb.Client()

        # Persistent client (data saved to disk)

        client = chromadb.PersistentClient(path="./chroma_db")

        # Create a collection with OpenAI embeddings

        embed_fn = OpenAIEmbeddingFunction(

            api_key="sk-...",

            model_name="text-embedding-3-small"

        )

        collection = client.create_collection(

            name="docs",

            embedding_function=embed_fn

        )

        # Add documents (auto-embeds using the embedding function)

        collection.add(

            documents=["LangChain is a framework for LLM applications",

                       "Qdrant is a vector database written in Rust"],

            metadatas=[{"source": "wiki"}, {"source": "wiki"}],

            ids=["doc1", "doc2"]

        )

        # Query with metadata filter

        results = collection.query(

            query_texts=["vector search database"],

            n_results=2,

            where={"source": "wiki"}  # Metadata filter

        )

Sweet spot: Local RAG prototypes, datasets under 1 million vectors, single-developer projects, and scenarios where you want to avoid running a separate database server.

Chroma Limitations

No production-grade distributed mode — horizontal scaling requires custom sharding
In-process mode is single-threaded for writes; concurrent writes require the HTTP server mode
Query performance degrades above ~2 million vectors on commodity hardware
No built-in hybrid search (BM25 + vector); keyword filtering is metadata-based only
Chroma Cloud is still in beta as of mid-2026; not suitable for critical production

Qdrant: Production-Grade Performance

⚡ Qdrant Best for Production Filtered Search

Qdrant is built in Rust, which gives it a meaningful performance edge over Python-based competitors. Its standout feature is payload filtering: you can filter by arbitrary JSON metadata during the ANN search (not after), which maintains high recall even with aggressive filters. This makes it particularly valuable for multi-tenant applications, e-commerce similarity search, and any use case where retrieval must be scoped to a subset of your data.

Qdrant Payload Filtering Benchmarks

These numbers are from our own testing on a 5-million-vector dataset (1536 dimensions, text-embedding-3-small) on a 16-core / 64GB machine running Qdrant 1.9:

Query Type	Qdrant (p50)	Qdrant (p99)	Chroma equiv.	pgvector equiv.
Pure ANN (no filter)	3.2ms	8.1ms	18ms	22ms
Filter: 1 metadata field (=)	4.1ms	10.2ms	24ms*	31ms
Filter: 3 fields (range + match)	5.8ms	14.5ms	N/A†	58ms
Batch of 50 queries	28ms total	65ms total	210ms total	340ms total

* Post-filter (re-rank after retrieval, degrades recall). † Complex multi-field filters not natively supported without post-filtering in Chroma.

The filtering advantage is decisive for multi-tenant RAG systems. If you're building a document assistant where each user can only see their own documents, Qdrant's payload-level filtering keeps each query scoped without requiring separate collections per user — a major operational simplification.

Qdrant Key Features

On-disk vector storage: memmap mode lets you index hundreds of millions of vectors without fitting them all in RAM. Query latency increases (~2–4x) but remains acceptable for many use cases.
Quantization: Scalar and product quantization reduce memory usage by 4–8x with ~1–3% recall loss. A 5M-vector index that requires 30GB in full float32 uses about 7GB with int8 scalar quantization.
Multi-vector support: ColBERT-style late interaction retrieval (storing multiple vectors per document) is natively supported, enabling more nuanced semantic matching without external re-ranking.
Distributed mode: Sharding across nodes with replication is built-in and well-documented, tested at hundreds of millions of vectors in production.

Weaviate: Hybrid Search and Enterprise Features

🕸️ Weaviate Best for Hybrid Search

Weaviate combines vector search with a full-text BM25 index in the same query, producing a fused "hybrid" score that improves retrieval quality on short queries, keyword-heavy content, and cases where exact term matching matters alongside semantic similarity.

Weaviate's hybrid search uses a Reciprocal Rank Fusion (RRF) algorithm to combine vector and keyword scores. In our testing on a technical documentation corpus, hybrid search improved precision@5 from 0.72 (pure vector) to 0.81 — a 12% improvement that's significant in production QA systems where users often search for specific product names, version numbers, or error codes that semantic search alone misses.

Weaviate Strengths

Native hybrid search: Single query returns results ranked by combined BM25 + vector score. No need to implement your own rank fusion logic or run two separate queries and merge results.
GraphQL API: Rich query interface with filtering, aggregation, cross-reference traversal, and near-text/near-vector search in a single query. Particularly useful for knowledge graph-style data models.
Multi-modal support: Built-in support for image, audio, and text vectors in the same collection (bind model). Can retrieve across modalities: find images similar to a text query.
Modules ecosystem: Text2Vec, Generative, QnA, and Reranker modules integrate directly, reducing the amount of application-side logic needed.

Weaviate Limitations

Higher resource consumption than Qdrant — base memory usage starts around 500MB, and production nodes typically need 4–8GB minimum for reasonable performance
GraphQL API has a steeper learning curve than REST-based competitors; simple queries require more verbose syntax
Distributed mode complexity: multi-node deployments require careful schema design (tenant keys, replication factors) to avoid hotspots
Write throughput is lower than Qdrant in our benchmarks (~8,000 vectors/sec vs ~15,000 vectors/sec for Qdrant on identical hardware)

pgvector: Vector Search Inside PostgreSQL

🐘 pgvector Best for Existing PostgreSQL Teams

pgvector adds a vector column type and ANN index support directly to PostgreSQL. If your team already runs PostgreSQL, this means zero new infrastructure — your vectors live in the same database as your application data, with full ACID transactions, row-level security, and your existing backup and monitoring stack.

The appeal is purely operational. PostgreSQL is one of the most understood, well-tooled, and production-proven databases in the world. Storing vectors there means no new database to operate, no new SDK to learn, and full JOIN capability between your vector data and relational data in a single query:

        -- pgvector: SQL with vector similarity search

        -- Create table with vector column

        CREATE TABLE documents (

            id SERIAL PRIMARY KEY,

            content TEXT,

            user_id INT REFERENCES users(id),

            embedding vector(1536),  -- OpenAI text-embedding-3-small

            created_at TIMESTAMPTZ DEFAULT NOW()

        );

        -- Create HNSW index (faster queries, more memory)

        CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)

            WITH (m = 16, ef_construction = 64);

        -- Similarity search with JOIN to related table

        -- Find the 5 docs most similar to the query, only for a specific user

        SELECT d.content, d.created_at, u.email,

            1 - (d.embedding <=> $1::vector) AS similarity

        FROM documents d

        JOIN users u ON d.user_id = u.id

        WHERE d.user_id = $2

            AND d.created_at > NOW() - INTERVAL '90 days'

        ORDER BY d.embedding <=> $1::vector

        LIMIT 5;

This SQL query joins vectors with user data, applies a recency filter, and returns similarity scores — all in a single query. Achieving the same in a purpose-built vector database would require fetching vector results and then making separate application queries to enrich them.

pgvector Performance Reality Check

pgvector's HNSW implementation is competitive at moderate scales. Our benchmarks on a dedicated Postgres instance (16 vCPUs, 64GB RAM) with the HNSW index configured at m=16, ef=64:

Dataset Size	Query Latency (p50)	Query Latency (p99)	Recall @10	Index Build Time
100K vectors	2.1ms	5.8ms	98.2%	45s
1M vectors	8.4ms	22ms	97.8%	12 min
10M vectors	41ms	110ms	96.1%	3.5 hrs
100M vectors	320ms+	900ms+	94%	>48 hrs

pgvector is genuinely competitive under 5 million vectors. Beyond 10 million, latency climbs into territory that will be noticeable in user-facing applications. The 100M vector case is essentially impractical without hardware that costs significantly more than running a purpose-built vector database cluster.

Selection Matrix: Which to Use

Here's a direct recommendation matrix covering the scenarios most teams actually encounter:

Scenario	Recommended	Why
Local prototype / hackathon	Chroma	Zero setup, runs in-process, LangChain/LlamaIndex default
Production, <5M vectors, no existing PG	Qdrant	Best query performance, excellent filtering, easy Docker deploy
Production, existing PostgreSQL stack	pgvector	Zero new infra, JOIN with relational data, familiar SQL tooling
Hybrid search (keyword + semantic)	Weaviate	Native BM25 + vector fusion, no external search engine needed
Multi-tenant SaaS (>1000 tenants)	Qdrant	Payload filtering scopes queries without per-tenant collections
100M+ vectors, enterprise scale	Weaviate / Qdrant	Both support distributed mode; Weaviate has stronger enterprise support SLA
Fully managed, no ops team	Pinecone / Qdrant Cloud	Both are serverless; Qdrant Cloud is open-source core with managed hosting
Research / billion-scale indexing	FAISS	Most flexible ANN library, used in Meta's production retrieval systems

Bottom Line

Start with Chroma for local development — it has zero setup friction and works with every major RAG framework. When you move to production, the choice is between Qdrant (if you need fast filtered search, multi-tenancy, or large scale) and pgvector (if you already run PostgreSQL and your vector count stays under 5 million). Choose Weaviate specifically when hybrid search quality is a core product requirement. The worst decision is to prematurely optimize — start simple, benchmark with your actual data, and migrate when you hit real performance limits.

Frequently Asked Questions

Can I use a regular SQL database like PostgreSQL instead of a vector database?

Yes, with the pgvector extension, PostgreSQL can store and query embeddings. This is a legitimate choice for teams with existing PostgreSQL infrastructure and datasets under approximately 5 million vectors. The trade-off is performance: pgvector's HNSW index is slower than purpose-built vector databases at scale, and it lacks advanced features like payload filtering at query time and distributed horizontal scaling. For prototypes and moderate-scale applications, pgvector is excellent. For tens of millions of vectors or high-QPS requirements, a dedicated vector database like Qdrant or Weaviate will outperform it significantly.

What is the difference between HNSW and IVF indexing in vector databases?

HNSW (Hierarchical Navigable Small World) and IVF (Inverted File Index) are the two dominant ANN algorithms. HNSW builds a multi-layer graph and navigates it during search, offering excellent recall (typically 95–99%) and fast query times even at large scales, but consuming more memory (typically 50–100 bytes per vector plus the original vector). IVF partitions vectors into clusters and searches only the nearest clusters, using less memory but requiring careful tuning of the nlist and nprobe parameters to balance recall versus speed. For most RAG applications, HNSW is the better default: faster queries, higher recall, and less parameter tuning required. IVF becomes attractive when memory is severely constrained and you can accept 5–10% lower recall.

How many vectors can these databases handle before performance degrades?

Practical limits: Chroma works well up to about 1–2 million vectors on a single machine with adequate RAM. Qdrant scales to hundreds of millions of vectors in single-node mode using on-disk indexing, and to billions with distributed mode. Weaviate similarly scales to hundreds of millions in distributed mode. pgvector handles up to about 5 million vectors with acceptable performance; beyond 10 million, index build time and query latency become challenging without significant hardware. FAISS can index billions of vectors in research settings but requires careful memory management and has no built-in persistence or server mode. For production systems expected to grow beyond 10 million vectors, plan for Qdrant or Weaviate from the start rather than migrating later.

What I actually use: Chroma for local development, Qdrant for anything beyond that. I've built with Weaviate and Milvus too, but both felt over-engineered for a project the size of AI_Guide. Qdrant's filtering performance is the practical reason I chose it for production — once you're combining semantic search with metadata filters (like "tools with 10k+ stars in the agent category"), Chroma's performance degrades and Qdrant stays fast. The thing nobody tells you when comparing vector databases: the operational complexity difference matters more than the benchmark numbers at most scales. Chroma requires zero ops; Qdrant requires minimal ops; Milvus requires dedicated ops attention.

— Nolan (yuzc), maintainer of AI Nav