Python Fuzzy Matching with FuzzyWuzzy — Core Concepts

FuzzyWuzzy is a Python library that uses sequence matching to score how similar two strings are on a scale from 0 to 100. It wraps Python’s difflib.SequenceMatcher with convenient functions tailored for common real-world matching problems.

For the underlying theory, see Python String Similarity Algorithms.

Installation Note

FuzzyWuzzy’s original package uses python-Levenshtein for speed. The modern drop-in replacement is rapidfuzz, which is faster, has no GPL dependency, and provides the same API. Both are covered here.

The Four Scoring Functions

Simple Ratio

Compares two strings directly using SequenceMatcher and returns a score.

“New York Mets” vs “New York Meats” scores around 96 — very close.

This works well when both strings are roughly the same length and structure.

Partial Ratio

Finds the best substring match. It slides the shorter string along the longer one and returns the highest score.

“Yankees” vs “New York Yankees” scores 100 with partial ratio, because “Yankees” appears perfectly within the longer string. Simple ratio would score much lower because of the length difference.

Best for: Matching when one string is a subset of the other.

Token Sort Ratio

Splits both strings into words, sorts them alphabetically, then compares. This neutralizes word order differences.

“John Smith Jr.” vs “Jr. Smith John” scores 100 after sorting, because alphabetically they produce the same sequence.

Best for: Names, titles, or phrases where word order varies.

Token Set Ratio

The most forgiving scorer. It splits into word sets, then compares the intersection with each string’s unique words.

“Los Angeles Lakers basketball” vs “Lakers Los Angeles” scores 100 because token set focuses on shared words and treats extra words as less important.

Best for: Records with inconsistent detail levels — one entry has extra context the other doesn’t.

Choosing the Right Scorer

SituationScorerWhy
Two similar-length full namesSimple ratioDirect comparison works
Short query vs long recordPartial ratioFinds the needle in the haystack
Same words, different orderToken sortOrder-independent
Extra words in one stringToken setIgnores extras
Not sureToken setMost forgiving default

Extracting Best Matches

Beyond comparing two strings, FuzzyWuzzy provides process.extract() to find the best matches from a list of choices. You supply a query and a list, and it returns the top matches with scores.

This is the primary API for searching — you rarely compare strings one pair at a time in practice.

Threshold Selection

The right threshold depends on your domain:

  • Names and addresses: 85-90 (typos are small)
  • Product catalog matching: 75-85 (abbreviations and variations are common)
  • Free-text descriptions: 60-75 (paraphrasing causes bigger differences)

Start with 85, test against labeled examples, and adjust. Too high misses valid matches. Too low floods you with false positives.

Common Misconception

“FuzzyWuzzy handles all fuzzy matching needs.” FuzzyWuzzy is excellent for short strings — names, addresses, product titles. For document-level similarity, it’s the wrong tool. TF-IDF cosine similarity or embedding-based approaches work better for paragraphs and beyond. Always match the tool to the text length.

One Thing to Remember

FuzzyWuzzy offers four scorers for four scenarios — simple ratio for similar strings, partial for subsets, token sort for reordered words, and token set for uneven detail — pick the one that fits your data’s messiness.

pythonfuzzy-matchingfuzzywuzzytext-processing

See Also

  • Python Regex Lookahead Lookbehind Learn how Python regex can peek ahead or behind without grabbing text — like checking what's next in line without stepping forward.
  • Python Regex Named Groups Learn how Python regex named groups let you label the pieces you capture — like putting name tags on your search results.
  • Python Regex Patterns Discover how Python regex patterns work like a secret code for finding hidden text treasures in any document.
  • Python Regular Expressions Learn how Python can find tricky text patterns fast, like spotting every phone number hidden in a messy page.
  • Python String Similarity Algorithms Discover how Python measures how alike two words are — like a spelling teacher who counts your mistakes instead of just saying wrong.