Vector Databases — Core Concepts

The Problem That Created a Billion-Dollar Market

In 2022, building an AI app that could search your company’s documents was basically impossible at scale. You could use keyword search (bad — misses synonyms, context, meaning), or you could send every document to a language model and ask “is this relevant?” one by one (also bad — slow and expensive).

Vector databases exist because neither of those worked. By 2024, Pinecone had raised $138 million and Weaviate was at a $100M+ valuation. A new category of database appeared in roughly 18 months.

What’s a Vector, Actually?

An embedding model takes any piece of data — a sentence, an image, an audio clip — and outputs a fixed-length array of decimal numbers. This array is the vector.

"The quick brown fox" → [0.23, -0.41, 0.87, 0.12, ..., 0.55]  (1536 numbers)
"A fast red dog"      → [0.26, -0.38, 0.84, 0.19, ..., 0.51]  (1536 numbers)

The key insight: semantically similar content gets numerically similar vectors. OpenAI’s text-embedding-3-small model outputs 1536-dimensional vectors. Every sentence you feed it lands somewhere in that 1536-dimensional space — and sentences with similar meaning land close together.

This isn’t hand-programmed. The model learned this structure by training on billions of text examples. It discovered, on its own, that king − man + woman ≈ queen in vector space.

How Similarity Search Works

Given a query vector, the goal is to find the N closest vectors in the database. The simplest measure is cosine similarity — basically measuring the angle between two vectors. Perfectly similar = angle of 0, completely unrelated = angle of 90°.

The naive approach: compare your query against every single stored vector. If you have 10 million documents, that’s 10 million distance calculations per search. For most applications, that’s too slow.

Approximate Nearest Neighbor (ANN)

Real vector databases don’t do exact search — they do approximate search. The trade-off: you might miss the single closest vector, but you’ll find results that are 95-99% as close, in milliseconds instead of seconds.

The most popular algorithm today is HNSW (Hierarchical Navigable Small World graphs). Think of it like a road system: highways for long-distance travel, local roads for precision. Built in 2016 by Russian researcher Yury Malkov, HNSW became the backbone of nearly every production vector database.

Other approaches include IVF (Inverted File Index) — which clusters vectors into buckets and only searches the most relevant buckets — and Flat search (exact, used only for small datasets or when accuracy is non-negotiable).

What Makes a Vector Database Different from pgvector?

Common misconception: “I’ll just use the pgvector extension for Postgres.” That works — until it doesn’t.

FeaturepgvectorPurpose-built vector DB
Scale~1M vectors comfortably100M+ vectors
Query speedMilliseconds to secondsSingle-digit milliseconds
Index typesIVF, HNSWMultiple + tuning options
FilteringSQL (flexible)Metadata filters (fast)
Operational overheadShared with your DBSeparate service

For a small app or prototype, pgvector is fine. For production search on millions of items, purpose-built tools like Pinecone or Weaviate are faster — because the entire system is optimized for nothing else.

The Standard Architecture: RAG

Most people encounter vector databases through Retrieval Augmented Generation (RAG). The pattern:

  1. Take your documents (PDFs, Notion pages, support tickets)
  2. Chunk them into ~500-token pieces
  3. Embed each chunk with a model (OpenAI, Cohere, local model)
  4. Store the vectors in a database like Chroma, Pinecone, or Qdrant
  5. At query time: embed the user’s question, find the 5 closest chunks, stuff them into the LLM’s context window

This is how products like Notion AI, Intercom Fin, and dozens of “chat with your docs” apps work. The LLM doesn’t store the knowledge — it just reasons over whatever the vector database retrieves.

Filtering: The Hidden Complexity

You never want just similarity. You want “find me similar documents that are from 2024 and belong to this customer and are marked as resolved.”

This is called metadata filtering, and it’s harder than it sounds. If you filter before the ANN search (pre-filtering), you might eliminate the clusters that contain your answer. If you filter after (post-filtering), you might discard most of your results.

Modern vector databases handle this differently. Qdrant uses payload-indexed filtering during search. Weaviate uses a hybrid inverted + vector approach. Pinecone has a metadata index alongside vectors. None of them have perfectly solved the filtering-plus-similarity problem — it’s still an active area.

Common Misconceptions

“Vector search replaces keyword search.” No. Keyword search is still better for exact terms, product SKUs, error codes. Most production systems use hybrid search — combining BM25 (keyword ranking) with vector similarity and merging results. Weaviate and Elasticsearch support this natively.

“More dimensions = better search.” Not necessarily. Higher-dimensional vectors capture more nuance, but also take more memory and compute. OpenAI’s latest models show that 256-dimensional embeddings often outperform older 1536-dimensional ones on benchmarks, because training improved.

“You need a vector database for AI.” For a demo? No. For millions of documents with real latency requirements? Yes.

One Thing to Remember

Vector databases are the long-term memory for AI applications — but they don’t store facts, they store positions in meaning-space. The whole system only works as well as the embedding model doing the translating.

vector-databaseaiembeddingssimilarity-searchragpineconeweaviate

See Also

  • Cloud Computing Cloud computing explained without jargon: why your photos, files, and favorite apps actually live on someone else's computer — and why that's a good thing.
  • Containerization Why does software that works on your computer break on everyone else's? Containers fix that — and they're why Netflix can deploy 100 updates a day without the site going down.
  • Activation Functions Why neural networks need these tiny mathematical functions — and how ReLU's simplicity accidentally made deep learning possible.
  • Ai Agents Architecture How AI systems go from answering questions to actually doing things — the design patterns that turn language models into autonomous agents that browse, code, and plan.
  • Ai Agents ChatGPT answers questions. AI agents actually do things — browse the web, write code, send emails, and keep going until the job is done. Here's the difference.