Python fnmatch — Deep Dive

fnmatch internals, regex compilation, platform behavior differences, and advanced pattern matching strategies for production code.

How fnmatch works internally

Under the hood, fnmatch.fnmatch() converts the wildcard pattern to a regular expression using fnmatch.translate() and then matches with re.match(). The translated regex is cached via functools.lru_cache, so repeated calls with the same pattern are fast.

The translation process:

import fnmatch

# Pattern: *.py
# Translates to: (?s:.*\.py)\Z
print(fnmatch.translate("*.py"))

# Pattern: data_[0-9]?.csv
# Translates to: (?s:data_[0-9].\.csv)\Z
print(fnmatch.translate("data_[0-9]?.csv"))

The (?s:...) wrapper enables DOTALL mode (. matches newlines too), and \Z anchors to the absolute end of the string.

The caching mechanism

Starting with Python 3.9, fnmatch uses functools.lru_cache(maxsize=256) for the compiled regex patterns. Previous versions used a manual dict cache with a size limit of 100:

# Effective behavior:
@functools.lru_cache(maxsize=256)
def _compile_pattern(pat):
    res = translate(pat)
    return re.compile(res).match

This means the first call with a new pattern pays the regex compilation cost (~10-50μs), but subsequent calls with the same pattern just do a dict lookup (~0.1μs).

Platform behavior: case sensitivity

The core behavioral difference between fnmatch and fnmatchcase:

import fnmatch, sys

# fnmatch.fnmatch normalizes case on Windows (os.path.normcase)
# fnmatch.fnmatchcase never normalizes case

# On Windows (os.name == 'nt'):
#   fnmatch("File.TXT", "*.txt") → True  (normcase lowercases)
# On Unix:
#   fnmatch("File.TXT", "*.txt") → False (no normcase)

The implementation:

def fnmatch(name, pat):
    name = os.path.normcase(name)
    pat = os.path.normcase(pat)
    return fnmatchcase(name, pat)

On Windows, os.path.normcase lowercases the string and converts / to \. On Unix, it’s a no-op. This means fnmatch.fnmatch gives platform-appropriate behavior, while fnmatchcase gives consistent cross-platform behavior.

Production recommendation: Use fnmatchcase when you need predictable behavior in tests and cross-platform code. Use fnmatch when you want to match the filesystem’s actual case sensitivity.

Character class edge cases

The [seq] syntax supports ranges and negation, but has subtle behaviors:

import fnmatch

# Ranges work
fnmatch.fnmatch("file5.txt", "file[0-9].txt")     # True
fnmatch.fnmatch("fileA.txt", "file[A-Z].txt")     # True

# Negation with !
fnmatch.fnmatch("file5.txt", "file[!a-z].txt")    # True (5 is not a-z)

# Literal ] must be first in the sequence
fnmatch.fnmatch("x]y", "x[]]y")                    # True

# Literal - must be first or last
fnmatch.fnmatch("a-b", "a[-]b")                    # True

One gotcha: character classes don’t support POSIX classes like [:alpha:] — only literal characters and ranges.

Building a file walker with fnmatch

Combining os.walk with fnmatch.filter gives you a recursive file finder:

import os, fnmatch
from pathlib import Path

def find_files(root, pattern):
    """Recursively find files matching a shell pattern."""
    matches = []
    for dirpath, dirnames, filenames in os.walk(root):
        for filename in fnmatch.filter(filenames, pattern):
            matches.append(os.path.join(dirpath, filename))
    return matches

# Find all Python files
python_files = find_files("src/", "*.py")

For excluding directories during traversal (important for performance with large trees):

EXCLUDE_DIRS = {".git", "node_modules", "__pycache__", ".venv"}

def find_files_filtered(root, pattern, exclude_dirs=EXCLUDE_DIRS):
    matches = []
    for dirpath, dirnames, filenames in os.walk(root):
        # Modify dirnames in-place to skip excluded directories
        dirnames[:] = [d for d in dirnames if d not in exclude_dirs]
        for filename in fnmatch.filter(filenames, pattern):
            matches.append(os.path.join(dirpath, filename))
    return matches

Performance comparison

Benchmarking pattern matching approaches for filtering 10,000 filenames:

Method	Time (10K names)
`fnmatch.filter(names, "*.py")`	~2ms
`[n for n in names if fnmatch.fnmatch(n, "*.py")]`	~8ms
`[n for n in names if n.endswith(".py")]`	~0.5ms
`[n for n in names if re.match(r".*\.py$", n)]`	~5ms
Pre-compiled regex `.match`	~3ms

fnmatch.filter() is ~4× faster than individual fnmatch() calls because it compiles the regex once and applies it in a tight loop. But for simple suffix/prefix checks, native string methods are still faster.

Guideline: Use string methods (.endswith(), .startswith()) for trivial patterns. Use fnmatch.filter() for wildcard patterns. Use compiled regex for complex patterns that run in hot loops.

Multi-pattern matching strategies

Using `translate()` to build a combined regex

import fnmatch, re

def multi_match(name, patterns):
    """Check if name matches any of the patterns."""
    combined = "|".join(fnmatch.translate(p) for p in patterns)
    return bool(re.match(combined, name))

# Pre-compiled version for repeated use
def compile_multi_pattern(patterns):
    combined = "|".join(fnmatch.translate(p) for p in patterns)
    return re.compile(combined).match

matcher = compile_multi_pattern(["*.py", "*.js", "*.ts"])
source_files = [f for f in all_files if matcher(f)]

This compiles all patterns into a single regex, which is faster than checking each pattern individually.

.gitignore-style matching

Git’s ignore patterns extend fnmatch with directory markers and negation. A simplified implementation:

import fnmatch, os

class GitIgnore:
    def __init__(self, patterns):
        self.include = []
        self.exclude = []
        for p in patterns:
            p = p.strip()
            if not p or p.startswith("#"):
                continue
            if p.startswith("!"):
                self.include.append(p[1:])
            else:
                self.exclude.append(p)

    def is_ignored(self, path):
        name = os.path.basename(path)
        ignored = any(fnmatch.fnmatch(name, p) for p in self.exclude)
        if ignored:
            return not any(fnmatch.fnmatch(name, p) for p in self.include)
        return False

Real .gitignore parsing is more complex (directory patterns, ** matching, order-dependent rules), but fnmatch handles the core pattern matching.

fnmatch in the standard library ecosystem

Several stdlib modules use fnmatch internally:

glob.glob() — uses fnmatch for the non-recursive parts of glob patterns
pathlib.Path.match() — uses fnmatch for pattern matching
shutil.copytree(ignore=...) — the ignore_patterns() helper uses fnmatch
unittest.TestLoader — filters test names using fnmatch patterns

import shutil

# copytree with fnmatch-based ignore
shutil.copytree(
    "src/",
    "dist/",
    ignore=shutil.ignore_patterns("*.pyc", "__pycache__", "*.egg-info"),
)

Security consideration

fnmatch patterns that come from user input are generally safe — they translate to regular expressions, but the translation is deterministic and doesn’t support backreferences or other ReDoS-prone constructs. However, extremely long patterns with many character classes could still cause slowdowns during regex compilation. Validate pattern length if accepting untrusted input.

One thing to remember

fnmatch is regex-under-the-hood with a human-friendly interface. Use filter() for batch operations, fnmatchcase() for cross-platform consistency, and translate() when you need to combine multiple patterns into a single compiled regex for maximum performance.

pythonstandard-libraryfile-handling