AI Architect Academy

The retrieval layer

Vector database comparison: choosing a store for RAG

Short answer

There is no single best vector database for RAG — the right store depends on four axes: whether you want it managed or self-hosted, whether you'd rather keep vectors inside your existing Postgres (pgvector) or run a dedicated engine, the scale you need, and how much filtering and hybrid search you'll do. For most applications, pgvector in a database you already run, or a managed option like Pinecone, is enough; a dedicated vector DB earns its place at large scale or when its search features carry their weight.

This page is about choosing the store. For retrieval as an architectural concept — how the agent decides when to retrieve — see agentic RAG; this page is the part it points to when it's time to pick a database.

What a vector database does

A vector database stores embeddings — numeric vectors that represent the meaning of text, images, or other data — and answers nearest-neighbour queries: given a query vector, return the stored items closest to it. In a RAG system that's how you fetch the passages most relevant to a question before handing them to the model. The hard part isn't storage; it's doing approximate nearest-neighbour (ANN) search fast over millions of vectors, usually with an index such as HNSW, while still letting you filter by metadata.

That's the whole job: index embeddings, search them quickly, filter the results. Everything below is about where that job is best run for your system, not which product wins in the abstract.

How to choose: the decision axes

Pick a vector store by reasoning down a short list of axes, not by chasing a benchmark leaderboard. The axes that actually decide it:

  • Managed vs self-hosted — a managed service removes operational burden (scaling, upgrades, backups) for a subscription; self-hosting open-source keeps data and cost in your control but you run it. This is usually the first fork.
  • Dedicated engine vs pgvector-in-Postgres — if you already run Postgres, the pgvector extension lets vectors live beside your relational data, with one system to operate and transactional consistency for free. A dedicated engine is a second system to run, justified when its scale or search features pay for the extra moving part.
  • Scale — thousands to low millions of vectors is comfortable almost anywhere; hundreds of millions to billions is where purpose-built engines and their indexing and sharding start to matter.
  • Filtering & hybrid search — most real RAG needs metadata filters (tenant, date, source) and often hybrid search that blends vector similarity with keyword/BM25 matching. How well a store does filtered and hybrid queries varies, and it's frequently the deciding feature.
  • Cost & lock-in — managed pricing scales with vectors and queries; self-hosting trades that for infrastructure you own. Open-source stores keep an exit; a proprietary managed-only service is the most convenient and the most locked-in.

Provenance matters here too: whatever store you pick, keep the source and metadata for every chunk so an answer can be traced back. An index you can't audit is one you can't ship.

The main vector databases compared

The six stores below cover the space most teams choose from. Positioning reflects each project's own stated deployment model and focus (see sources); the "best for" column is a neutral read of where each fits, not a ranking. No benchmarks or pricing are asserted — those change and are best checked live.

Vector storeManaged / OSSBest for
pgvectorOpen-source Postgres extension (self-host or inside managed Postgres)Teams already on Postgres who want vectors beside relational data, one system to run.
PineconeManaged / cloud only (proprietary, serverless)Wanting a fully managed, API-first store with no infrastructure to operate.
WeaviateBoth (Apache-2.0 OSS + Weaviate Cloud)Object + vector storage with built-in hybrid search and reranking in one query layer.
QdrantBoth (Apache-2.0 OSS, Rust + Qdrant Cloud)High-performance, large-scale vector search where you may want to self-host or go hybrid/private.
ChromaBoth (Apache-2.0 OSS + Chroma Cloud)Lightweight, developer-friendly prototyping that can grow from laptop to cloud.
MilvusBoth (Apache-2.0 OSS + Zilliz Cloud managed)Cloud-native search at very large (billion-vector) scale.

On the common head-to-heads: Pinecone vs Weaviate vs Qdrant mostly comes down to managed-only convenience (Pinecone) versus open-source with a managed option and richer self-host control (Weaviate, Qdrant) — and whether built-in hybrid search (Weaviate) or self-hosted performance (Qdrant) is the feature you're optimising for. There's no universal winner; there's a best fit for your constraints.

Do you even need a dedicated vector DB?

Often, no. For a great many applications, vectors fit comfortably in a database you already operate — pgvector turns Postgres into a vector store, keeping one system, one backup story, and transactional consistency between your embeddings and the rows they describe. The engineering community has made this argument loudly: for typical workloads you probably don't need a separate vector database (see sources).

A dedicated vector database earns its place when one of the axes above forces it — scale beyond what your Postgres comfortably indexes, hybrid-search or reranking features you'd otherwise rebuild, or an operational preference for a managed service that owns ANN tuning for you. Reach for it when a concrete requirement points there, not by default. Adding a second datastore "because RAG" is the same over-engineering reflex an architect's job is to resist.

Where the store sits relative to the agent is itself an architectural decision — see agentic AI architecture for how the retrieval layer connects to the rest of the system, and the Model Context Protocol for exposing retrieval as a reusable tool.

Frequently asked questions

What is the best vector database for RAG?

There isn't one universal best — it depends on your constraints. For most applications pgvector (vectors inside Postgres) or a managed service like Pinecone is enough. Dedicated engines such as Weaviate, Qdrant, or Milvus become the better pick at large scale or when you need their hybrid-search, reranking, or self-hosting characteristics. Choose by the decision axes — managed vs self-host, scale, filtering, cost — not by a leaderboard.

Do you need a vector database for RAG?

Not necessarily. RAG needs somewhere to store embeddings and run nearest-neighbour search, but that can be a Postgres database you already run via the pgvector extension. A dedicated vector database is worth adding when scale, hybrid search, or operational preference call for it — otherwise it's an extra system to run for no concrete gain.

Pinecone vs Weaviate vs Qdrant — which?

Pinecone is managed-only and proprietary: the least operational work, the most lock-in. Weaviate and Qdrant are open-source with managed options, so you can self-host or run hybrid/private; Weaviate leans into built-in hybrid search and reranking, Qdrant into high-performance self-hostable search in Rust. Choose Pinecone for zero-ops, Weaviate for integrated hybrid search, Qdrant for performance and self-host control.

Is pgvector good enough?

For a large share of applications, yes. pgvector adds vector storage and similarity search to PostgreSQL, so embeddings live beside your relational data with one system to operate and transactional consistency. It's the pragmatic default if you already run Postgres; you graduate to a dedicated engine when scale or specific search features outgrow it, not before.

What is the difference between a vector database and a regular database?

A regular (relational) database is optimised for exact matches and structured queries over rows and columns. A vector database is optimised for similarity search over high-dimensional embeddings — find the items closest in meaning to a query vector — using approximate nearest-neighbour indexes like HNSW. pgvector blurs the line by adding vector search to a relational database, so you don't always need a separate system.

How do you choose a vector database?

Reason down the decision axes: managed vs self-hosted, dedicated engine vs pgvector-in-Postgres, the scale you need, how much metadata filtering and hybrid search you'll do, and cost and lock-in tolerance. Start from the simplest option that meets your scale and feature needs — often pgvector or a managed service — and only add a dedicated store when a concrete requirement forces it.

Sources & provenance

No benchmarks or pricing are asserted; both change frequently and should be checked against each vendor's live docs before you commit. Corrections: hello@aiarch.dev.

Learn to choose and operate the retrieval layer by building it.

AI Architect Academy teaches the retrieval layer — embeddings, vector stores, filtering, and hybrid search — as a first-class skill, on a platform that is itself a production agentic system built across Anthropic, AWS, and Cloudflare. The build is the curriculum.

Free sample — no signup · every claim cited · cancel anytime

Or get notified when new tracks ship.