Python linecache — Deep Dive

Internal architecture

linecache is one of Python’s simplest modules — under 200 lines. Its core data structure is the module-level cache dictionary:

# linecache.cache is a dict:
# {filename: (size, mtime, lines, fullname)}

# - size: file size at time of reading (int)
# - mtime: modification time (float)
# - lines: list of line strings including newlines
# - fullname: resolved absolute path

The getline flow

When you call linecache.getline(filename, lineno):

  1. Check if filename is in cache
  2. If cached: return cache[filename][2][lineno - 1] (the lines list)
  3. If not cached: call updatecache(filename)
  4. updatecache tries to stat the file, read it, split into lines, and store in cache
  5. If the file doesn’t exist or can’t be read, store an empty entry

Lazy loading of the standard library

linecache has special handling for Python’s own modules. When a traceback references a stdlib module, linecache resolves the path using __loader__ attributes. This works even for frozen modules (compiled into the Python binary) and zipimported modules:

import linecache

# This works even for modules inside a .zip or frozen into Python
line = linecache.getline("<frozen importlib._bootstrap>", 1)

The module inspects module.__loader__ and calls get_source() if available, falling back to disk reads.

Integration with the traceback module

The traceback module is linecache’s primary consumer. Here’s how they interact:

import traceback, linecache

# traceback.format_exception internally does:
# for each frame in the stack:
#     line = linecache.getline(frame.filename, frame.lineno, frame.globals)
#     format and append to output

The third argument to getlinemodule_globals — enables loading source for modules with custom importers. If a module was loaded from a database, network, or zip file, its __loader__ can provide source code that linecache caches like any regular file.

Custom source providers

You can inject lines into linecache’s cache to make tracebacks work for dynamically generated code:

import linecache

# Register dynamically generated code
source = "def add(a, b):\n    return a + b\n"
lines = source.splitlines(True)
linecache.cache["<generated:add>"] = (
    len(source),    # size
    None,            # mtime (None = never invalidate)
    lines,           # list of lines
    "<generated:add>" # fullname
)

# Now compile with the matching filename
code = compile(source, "<generated:add>", "exec")
exec(code)

# If add() raises an error, the traceback will show the source

This technique is used by:

  • Template engines (Jinja2, Mako) to show template source in tracebacks
  • exec()-based frameworks that generate code at runtime
  • REPLs and notebooks for interactive code display

Encoding handling

linecache uses tokenize.open() to read Python source files, which respects PEP 263 encoding declarations:

# -*- coding: utf-8 -*-

For non-Python files, it falls back to open() with the system default encoding. If you need a specific encoding for non-Python files, you’ll need to manually populate the cache:

import linecache

def cache_file_with_encoding(filename, encoding="utf-8"):
    with open(filename, "r", encoding=encoding) as f:
        lines = f.readlines()
    import os
    stat = os.stat(filename)
    linecache.cache[filename] = (
        stat.st_size,
        stat.st_mtime,
        lines,
        filename,
    )

checkcache vs. clearcache

These two functions serve different purposes:

clearcache() — wipes the entire cache. Next getline call re-reads from disk.

checkcache(filename=None) — for each cached file, stats the file on disk. If the size or mtime has changed, removes that entry from cache. Doesn’t re-read — just invalidates stale entries.

import linecache

# Scenario: hot-reload during development
def on_file_changed(filename):
    linecache.checkcache(filename)
    # Next getline() call will re-read the updated file

checkcache is cheaper than clearcache when you have many files cached and only a few changed — it avoids re-reading unchanged files.

Building tools with linecache

Source-level coverage reporter

import linecache

def format_coverage(filename, covered_lines, total_lines):
    """Show source with coverage markers."""
    output = []
    for lineno in range(1, total_lines + 1):
        line = linecache.getline(filename, lineno).rstrip()
        marker = "✅" if lineno in covered_lines else "❌"
        output.append(f"{marker} {lineno:4d} | {line}")
    return "\n".join(output)

Lightweight source browser

import linecache

def view_source(filename, start=1, end=None):
    """Display source lines with line numbers."""
    lines = linecache.getlines(filename)
    if not lines:
        return f"Cannot read {filename}"

    end = end or len(lines)
    width = len(str(end))
    output = []
    for i in range(start - 1, min(end, len(lines))):
        output.append(f"{i + 1:{width}d} | {lines[i]}")
    return "".join(output)

Custom traceback formatter

import linecache, sys, traceback

def enhanced_traceback(exc_type, exc_value, exc_tb):
    """Traceback with 3 lines of context around each frame."""
    frames = traceback.extract_tb(exc_tb)
    parts = [f"{exc_type.__name__}: {exc_value}\n\n"]

    for frame in reversed(frames):
        parts.append(f"  {frame.filename}:{frame.lineno} in {frame.name}\n")
        for offset in range(-2, 3):
            ln = frame.lineno + offset
            line = linecache.getline(frame.filename, ln)
            if line:
                marker = "→" if offset == 0 else " "
                parts.append(f"  {marker} {ln:4d} | {line}")
        parts.append("\n")

    return "".join(parts)

# Install as the default exception handler
# sys.excepthook = lambda *args: print(enhanced_traceback(*args))

Performance profile

OperationTypical time
First getline (cache miss, 1KB file)~200μs
First getline (cache miss, 100KB file)~2ms
Subsequent getline (cache hit)~0.1μs
checkcache per file~10μs (one stat call)
clearcache~0.1μs (dict clear)

The cache-hit path is essentially cache[filename][2][lineno - 1] — two dictionary lookups and a list index. This makes linecache suitable for use in debugger single-stepping, where getline is called for every executed line.

Thread safety

linecache’s global cache dictionary is not protected by locks. In CPython, the GIL provides some protection, but concurrent writes from multiple threads could theoretically corrupt the cache. In practice this rarely causes issues because:

  1. Writes happen during module import (which has its own locks)
  2. The cache is write-once per file (re-reading the same file produces the same result)
  3. Traceback generation is typically single-threaded

If you’re using linecache from multiple threads in a free-threaded Python build (3.13+ with --disable-gil), consider wrapping calls in a lock.

One thing to remember

linecache is the invisible infrastructure behind Python’s tracebacks — a simple dict-based cache that maps filenames to line lists. Its power for tooling comes from the ability to inject custom entries for generated code, making tracebacks work for template engines, REPLs, and code generators.

pythonstandard-librarydebugging

See Also

  • Python Atexit How Python's atexit module lets your program clean up after itself right before it shuts down.
  • Python Bisect Sorted Lists How Python's bisect module finds things in sorted lists the way you'd find a word in a dictionary — by jumping to the middle.
  • Python Contextlib How Python's contextlib module makes the 'with' statement work for anything, not just files.
  • Python Copy Module Why copying data in Python isn't as simple as it sounds, and how the copy module prevents sneaky bugs.
  • Python Dataclass Field Metadata How Python dataclass fields can carry hidden notes — like sticky notes on a filing cabinet that tools read automatically.