Python Regex Patterns — Core Concepts
Regular expressions let you define text-matching rules using a compact pattern language. Python’s re module compiles these patterns into efficient search engines that scan strings in a single pass.
While Python Regular Expressions covers the re module’s API, this article focuses on the patterns themselves — the syntax you actually write inside re.compile().
The Pattern Alphabet
Every regex pattern is built from a small set of symbols.
Literal characters match themselves. The pattern cat matches the exact sequence c-a-t.
Metacharacters have special meaning:
| Symbol | Meaning | Example match |
|---|---|---|
. | Any character except newline | a.c → “abc”, “a3c” |
\d | Any digit (0-9) | \d\d → “42” |
\w | Word character (letter, digit, underscore) | \w+ → “hello_2” |
\s | Whitespace (space, tab, newline) | \s+ → ” “ |
\b | Word boundary | \bcat\b → “cat” but not “catalog” |
Capital versions negate: \D means non-digit, \W means non-word character, \S means non-whitespace.
Character Classes
Square brackets define custom character sets.
[aeiou]matches any single vowel[0-9]matches any digit (same as\d)[^abc]matches anything except a, b, or c[a-zA-Z]matches any English letter
You can combine ranges: [a-zA-Z0-9_] is equivalent to \w for ASCII text.
Quantifiers
Quantifiers control how many times a piece must repeat.
| Quantifier | Meaning |
|---|---|
* | Zero or more |
+ | One or more |
? | Zero or one |
{3} | Exactly 3 |
{2,5} | Between 2 and 5 |
{3,} | 3 or more |
By default, quantifiers are greedy — they grab as much text as possible. Add ? after a quantifier to make it lazy (minimal matching): .*? stops at the first opportunity.
Anchors
Anchors don’t match characters — they match positions.
^matches the start of a string (or line in multiline mode)$matches the end\bmatches a word boundary
Use anchors when you need the pattern to match the whole input, not just a substring. ^\d{5}$ ensures the entire string is exactly five digits.
Groups and Alternation
Parentheses create groups that let you:
- Extract parts of a match:
(\d{3})-(\d{4})captures area code and number separately - Apply quantifiers to sequences:
(ha)+matches “ha”, “haha”, “hahaha” - Alternate with the pipe
|:(cat|dog)matches either word
Named groups make extracted data clearer: (?P<year>\d{4})-(?P<month>\d{2}) gives each capture a label.
Non-capturing groups (?:...) group without capturing, which is useful when you need grouping for alternation but don’t care about extracting the match.
Lookahead and Lookbehind
These assert what comes before or after your match without consuming characters.
(?=...)positive lookahead:\d+(?= dollars)matches digits followed by ” dollars”(?!...)negative lookahead:\d+(?! cents)matches digits NOT followed by ” cents”(?<=...)positive lookbehind:(?<=\$)\d+matches digits preceded by ”$”(?<!...)negative lookbehind:(?<!\\)\nmatches newlines not preceded by backslash
Common Misconception
“Regex can parse any structured format.” Regex works on flat text patterns. It cannot reliably handle nested structures like HTML or JSON. For those, use a proper parser. Regex is the right tool for extracting simple, predictable patterns — not for replacing a full grammar.
One Thing to Remember
Regex patterns are assembled from a small toolkit — character classes, quantifiers, anchors, and groups — and nearly every pattern you’ll ever need is a combination of these four pieces.
See Also
- Python Fuzzy Matching Fuzzywuzzy Find out how Python's FuzzyWuzzy library matches messy, misspelled text — like a friend who understands you even when you mumble.
- Python Regex Lookahead Lookbehind Learn how Python regex can peek ahead or behind without grabbing text — like checking what's next in line without stepping forward.
- Python Regex Named Groups Learn how Python regex named groups let you label the pieces you capture — like putting name tags on your search results.
- Python Regular Expressions Learn how Python can find tricky text patterns fast, like spotting every phone number hidden in a messy page.
- Python String Similarity Algorithms Discover how Python measures how alike two words are — like a spelling teacher who counts your mistakes instead of just saying wrong.