Search Ranking Algorithms in Python — Core Concepts
Search ranking determines the order of results returned for a query. A search engine that finds 10,000 matching documents is useless if the best one is buried at position 5,000. Ranking algorithms score each document’s relevance and sort accordingly.
TF-IDF: the foundation
Term Frequency (TF) measures how often a query term appears in a document. More occurrences suggest higher relevance.
Inverse Document Frequency (IDF) measures how rare a term is across all documents. Common words like “the” get low weight; rare words like “Kubernetes” get high weight.
TF-IDF(term, doc) = TF(term, doc) × IDF(term)
IDF(term) = log(total_documents / documents_containing_term)
A document’s total score for a multi-word query is the sum of TF-IDF scores for each query term.
BM25: the industry standard
BM25 (Best Matching 25) improves on TF-IDF with two key refinements:
-
Saturation — term frequency has diminishing returns. A word appearing 10 times isn’t 10x more relevant than appearing once. BM25 uses a logarithmic saturation curve controlled by parameter
k1. -
Document length normalization — longer documents naturally contain more term occurrences. Parameter
bcontrols how much to penalize long documents.
import math
def bm25_score(query_terms, doc_tf, doc_length, avg_doc_length,
doc_freq, total_docs, k1=1.5, b=0.75):
score = 0.0
for term in query_terms:
tf = doc_tf.get(term, 0)
df = doc_freq.get(term, 0)
idf = math.log((total_docs - df + 0.5) / (df + 0.5) + 1)
tf_norm = (tf * (k1 + 1)) / (tf + k1 * (1 - b + b * doc_length / avg_doc_length))
score += idf * tf_norm
return score
Tuning k1 and b:
k1 = 1.2-2.0— higher values give more weight to term frequencyb = 0.75— the default works well; lower values reduce length normalization
Using the rank_bm25 library:
from rank_bm25 import BM25Okapi
corpus = [doc.split() for doc in documents]
bm25 = BM25Okapi(corpus)
query = "python async performance"
scores = bm25.get_scores(query.split())
top_indices = sorted(range(len(scores)), key=lambda i: -scores[i])[:10]
Field-weighted scoring
Real search applications have multiple fields (title, body, tags). Matches in the title are usually more important than matches in the body.
final_score = w_title × BM25(title) + w_body × BM25(body) + w_tags × BM25(tags)
Typical weights: title (3x), tags (2x), body (1x). The exact values depend on your domain — experiment and measure.
Beyond text: boosting signals
Production ranking blends text relevance with other signals:
- Freshness — newer content ranks higher for time-sensitive queries
- Popularity — click-through rate, purchase count, or view count
- Authority — PageRank-style link analysis or domain reputation
- Personalization — user’s past behavior influences ranking
These are combined as additive or multiplicative boosts on top of the text relevance score.
Learning to Rank (LTR)
Instead of hand-tuning weights, train a machine learning model to learn the optimal ranking function from labeled data.
Three approaches:
- Pointwise — predict a relevance score for each document independently
- Pairwise — predict which of two documents is more relevant (RankNet, LambdaRank)
- Listwise — optimize the entire ranked list directly (LambdaMART, ListNet)
LambdaMART (gradient-boosted trees optimizing NDCG) is the most widely used in production. XGBoost and LightGBM support LTR objectives.
Evaluation metrics
| Metric | What it measures |
|---|---|
| NDCG@K | Quality of ranking considering position (graded relevance) |
| MAP@K | Average precision across queries (binary relevance) |
| MRR | Position of the first relevant result |
| Precision@K | Fraction of top-K results that are relevant |
Common misconception
People think more sophisticated algorithms always produce better rankings. In practice, a well-tuned BM25 with thoughtful field weighting often outperforms a poorly trained neural ranker. Start simple, measure carefully, and add complexity only when metrics justify it.
One thing to remember: great search ranking starts with BM25 as a strong baseline, then layers on domain-specific signals and evaluation — the algorithm matters less than understanding what “relevant” means for your users.
See Also
- Activation Functions Why neural networks need these tiny mathematical functions — and how ReLU's simplicity accidentally made deep learning possible.
- Ai Agents Architecture How AI systems go from answering questions to actually doing things — the design patterns that turn language models into autonomous agents that browse, code, and plan.
- Ai Agents ChatGPT answers questions. AI agents actually do things — browse the web, write code, send emails, and keep going until the job is done. Here's the difference.
- Ai Ethics Why building AI fairly is harder than it sounds — bias, accountability, privacy, and who gets to decide what AI is allowed to do.
- Ai Hallucinations ChatGPT sometimes makes up facts with total confidence. Here's the weird reason why — and why it's not as simple as 'the AI lied.'