Semantic Search in Python — Core Concepts

Semantic search retrieves documents based on meaning rather than keyword overlap. In Python, this involves converting text to vector embeddings, indexing those vectors, and finding the closest matches to a query vector. It powers search features in products from Google to Notion to Spotify.

How it works

The process has three stages:

  1. Embed documents — run each document through an embedding model to get a vector (a list of numbers, typically 768-3072 dimensions) that represents its meaning.
  2. Index vectors — store vectors in a structure optimized for nearest-neighbor search.
  3. Query — embed the search query with the same model, find the closest document vectors, return the corresponding documents.

Similarity metrics

The most common distance functions:

  • Cosine similarity — measures the angle between two vectors. Most popular for text search because it is not affected by vector magnitude.
  • Dot product — similar to cosine but affected by magnitude. Faster to compute and equivalent to cosine when vectors are normalized.
  • Euclidean distance — straight-line distance in vector space. Less common for text but used in some clustering applications.

For normalized embeddings (most modern models produce these), cosine similarity and dot product give identical rankings.

Embedding model choices

The embedding model determines search quality more than anything else. Current strong options:

  • OpenAI text-embedding-3-small/large — good quality, easy to use, pay per token.
  • Cohere embed-v3 — excellent multilingual support.
  • sentence-transformers (local) — models like bge-large-en-v1.5 or e5-large-v2 run on your hardware. Free but require GPU for fast inference.

Match the model to your content. A model trained on English academic text will not perform well on Spanish product reviews.

Index and storage options

For small datasets (under 100k documents), in-memory search with numpy or FAISS is sufficient. For production scale:

  • FAISS — Meta’s library for efficient similarity search. Runs locally, supports GPU acceleration.
  • Chroma — easy-to-use embedded vector database. Good for prototyping and small-to-medium datasets.
  • Qdrant / Weaviate / Pinecone — dedicated vector databases with filtering, scaling, and operational features.
  • Elasticsearch with vectors — if you already use Elasticsearch, its kNN search adds semantic capabilities.
  • pgvector — PostgreSQL extension for teams that want to keep everything in Postgres.

Pure semantic search misses exact matches. Pure keyword search misses meaning. Combine both:

Retrieve results from both a keyword index (BM25) and a vector index, then merge them. Reciprocal rank fusion (RRF) is a simple merging strategy that works well: each result gets a score based on its rank in each list, and results appearing in both lists get boosted.

Common misconception

People assume semantic search replaces keyword search entirely. It does not. Semantic search struggles with exact identifiers (product codes, error numbers, proper nouns it was not trained on). The best production search systems run both approaches in parallel and merge results.

Practical tips

  • Chunk long documents before embedding — a single vector for a 50-page document captures too little detail.
  • Use metadata filtering to narrow search scope before computing similarity.
  • Benchmark retrieval quality with a test set of queries and known relevant documents.
  • Monitor search quality in production by logging low-confidence results and user feedback signals (clicks, reformulations).

The one thing to remember: Semantic search in Python combines embedding models with vector indexing to find documents by meaning — and the best production systems pair it with keyword search to handle both conceptual and exact-match queries.

pythonsemantic-searchnlpembeddingsvector-databases

See Also

  • Python Agent Frameworks An agent framework gives AI the ability to plan, use tools, and work through problems step by step — like upgrading a calculator into a research assistant.
  • Python Embedding Pipelines An embedding pipeline turns words into numbers that capture meaning — like translating every sentence into coordinates on a giant map of ideas.
  • Python Guardrails Ai Guardrails are safety bumpers for AI — they check what the model says before it reaches users, like a spellchecker but for facts, tone, and dangerous content.
  • Python Llm Evaluation Harness An LLM evaluation harness is like a report card for AI — it runs tests and grades how well the model answers questions so you know if it is actually improving.
  • Python Llm Function Calling Function calling lets an AI ask your Python code for help — like a chef who can read a recipe but needs someone else to actually open the fridge.