Vector Databases Explained: Powering AI Retrieval Systems

A vector database stores high-dimensional vectors (embeddings) and supports fast similarity search over them.
Where a SQL database finds rows by exact match on values, a vector database finds rows by proximity in a continuous space — “show me items closest in meaning to this query”.
They are the storage layer for RAG, recommender systems, semantic search, and image/audio/code similarity.
Popular options include Pinecone, Weaviate, Qdrant, Milvus, Chroma, and the pgvector extension for PostgreSQL.
The workhorse algorithm for fast similarity search is HNSW — hierarchical navigable small world graphs — along with variants like IVF and product quantization.

What is a vector, really?

When an embedding model reads a piece of text — a sentence, a paragraph, a document — it outputs a list of numbers, typically 384, 768, or 1,536 of them. That list is a vector. The trick is that the model is trained so that semantically similar texts produce vectors that are close together in the 768-dimensional space, while unrelated texts end up far apart.

The same idea applies to images (CLIP produces image-and-text vectors in the same space), code (code embeddings let you find functions similar to a query), and audio. Anywhere you can produce a consistent embedding, you can store it in a vector database and query by similarity. For more on how embeddings are learned, see our embeddings primer.

How similarity search works

The core operation of a vector database is nearest-neighbour search: given a query vector, find the K vectors closest to it. “Close” is usually measured by cosine similarity, dot product, or Euclidean distance, depending on how the embedding model was trained.

For a small collection, you can compute the distance to every stored vector and sort — brute-force search. This does not scale. A million-vector index with brute-force search becomes slow; a billion-vector index is impossible. Vector databases solve this with approximate nearest-neighbour (ANN) algorithms that trade a small amount of accuracy for orders-of-magnitude speed gains.

HNSW: the dominant ANN index

Hierarchical navigable small world graphs build a multi-layer graph where search starts at the top layer with long hops and descends through progressively denser layers with shorter hops. In practice, HNSW reaches over 95% recall at query rates thousands of times faster than brute force. It is the default algorithm in Qdrant, Milvus, Weaviate, and pgvector.

IVF and product quantization

Inverted file (IVF) indexes partition the vector space into cells and search only a few cells per query. Product quantization (PQ) compresses each vector from, say, 1,536 floats to a much smaller code, trading accuracy for memory efficiency. The FAISS library from Meta pioneered many of these techniques and remains a reference implementation.

Vector DB vs. SQL database

SQL databases are designed for structured, exact-match data. You query by WHERE clauses that test equality, range, or relational joins. Vector databases are built for the opposite — approximate, semantic, high-dimensional proximity queries. Most real systems use both. Product metadata (name, price, category) lives in a SQL table; the product’s embedded description lives in a vector index with a foreign key back to the SQL row.

The pgvector extension for PostgreSQL blurs this line by giving a relational database first-class vector support. For many small-to-medium workloads, pgvector is the simplest choice — no extra service to run, familiar SQL, and good enough performance up to tens of millions of vectors.

What people actually use vector databases for

Retrieval-augmented generation

The most common use case today. An LLM’s context window is expanded with a query to a vector database that fetches relevant documents. See our rag explainer for the full picture.

Semantic search

Search engines that understand intent rather than keywords. “Comfortable running shoes for flat feet” retrieves the right products even when the product descriptions never use that exact phrase. Internal company search, e-commerce search, and legal document search have all moved toward vector-backed retrieval.

Recommendation systems

Find items similar to what a user has liked before. Spotify’s “Discover Weekly” uses embeddings of songs and listening histories; Netflix uses embeddings of shows and users. At scale, these are vector-retrieval problems.

Duplicate and near-duplicate detection

Find news articles that are paraphrases of each other, catch plagiarism, detect reposted content, cluster support tickets. Any “is this similar to something we already have?” question is a vector-database query.

Multimodal search

Search images by text, find audio by description, match code to intent. Models like CLIP, OpenCLIP, and ImageBind produce embeddings that let different modalities share a space.

Choosing a vector database

The options roughly split by deployment model.

Managed SaaS: Pinecone, Weaviate Cloud, Qdrant Cloud. Zero operational overhead, highest cost per vector.
Self-hosted open source: Milvus, Qdrant (both have strong self-hosted stories), Weaviate, Chroma. More ops work, full control, usually cheaper at scale.
Embedded libraries: FAISS, Annoy, hnswlib, DiskANN. Run in-process, no service; best for offline and single-machine use.
Extensions to existing databases: pgvector (PostgreSQL), vector search in Elasticsearch/OpenSearch, MongoDB Atlas Vector Search. Simplest when you already use the parent system.

For teams just starting with RAG, pgvector or Chroma are good zero-friction options. At larger scale — hundreds of millions of vectors, strict latency requirements, heavy hybrid search — dedicated vector databases tend to pay off.

Operational concerns

Once you deploy, reality gets messier than benchmarks. Embedding models change — a new version produces different vectors, so the index has to be rebuilt. Hybrid search (combining vector and keyword) consistently outperforms pure vector search, but requires indexing both. Metadata filtering (“only return documents from this user’s workspace”) can slow ANN indexes significantly if the database does not handle pre-filtering well. Re-ranking the top-K results with a cross-encoder often wins more precision than tuning the index itself.

Frequently asked questions

How do I know if I need a vector database?
If you are building search or retrieval and care about matching meaning rather than exact words, you probably need one. Signs that a plain SQL or full-text search engine is not enough: users phrase queries differently than the indexed documents; results need to cluster by topic; you want to match across languages; you need similarity rather than containment. If your queries are precise WHERE-clauses on structured fields, vectors add cost without value.

Is pgvector good enough, or do I need Pinecone?
Pgvector is good enough for most teams under about 10 million vectors with moderate query rates. It keeps everything in one database, which is a huge operational win. Once you need sharded billion-vector indexes, real-time updates at scale, or extreme latency requirements, dedicated vector databases start to justify their cost. The honest answer for most startups and mid-size companies is: start with pgvector, move later only if you hit a wall.

How expensive are vector databases?
Cost varies wildly by deployment model. A small Chroma or Qdrant instance runs on a single VM for tens of dollars a month. Managed SaaS like Pinecone typically charges per million vectors stored and per query, and costs can scale from hundreds to tens of thousands of dollars monthly at production scale. For cost-sensitive workloads, self-hosting with HNSW on commodity hardware is usually cheapest. For more on the broader AI industry, see our ai industry coverage.