Source-to-Source Transformers — Core Concepts

What Is Source-to-Source Transformation?

A source-to-source transformer (also called a transpiler or codemod tool) reads source code in one form, modifies it according to rules, and writes it back as source code. Unlike a compiler that produces bytecode or machine code, the output is still human-readable Python.

The key difference from text-based find-and-replace is structural awareness. These tools parse code into a tree, operate on the tree, and convert it back to text — ensuring that changes respect the language’s grammar and semantics.

Why Not Just Use ast?

Python’s built-in ast module works well for analysis and transformation, but it has a critical limitation: it discards formatting information. Comments, whitespace, trailing commas, and blank lines are all lost. When you ast.unparse() a modified tree, the output is functional but looks nothing like the original code.

For transformations where the output must look like human-written code — readable diffs, maintained style, preserved comments — you need a Concrete Syntax Tree (CST) that preserves every detail of the original formatting.

libcst: The Modern Standard

libcst is the leading Python CST library. It parses source into a tree that preserves all whitespace, comments, and formatting, then lets you transform nodes while keeping everything else intact.

import libcst as cst

class RenameFunction(cst.CSTTransformer):
    def leave_Call(self, original_node, updated_node):
        if isinstance(updated_node.func, cst.Name):
            if updated_node.func.value == "old_function":
                return updated_node.with_changes(
                    func=cst.Name("new_function")
                )
        return updated_node

source = 'result = old_function(x, y)  # important call\n'
tree = cst.parse_module(source)
modified = tree.visit(RenameFunction())
print(modified.code)
# result = new_function(x, y)  # important call

Notice the comment is preserved — something ast could never do.

Common Transformation Use Cases

Python Version Migration

When upgrading Python versions, certain patterns become obsolete or gain better alternatives. Codemods automate these migrations:

  • dict.has_key(x)x in dict
  • print "hello"print("hello")
  • isinstance(x, (str, unicode))isinstance(x, str)
  • Old-style % formatting → f-strings
  • typing.Optional[X]X | None (Python 3.10+)

API Deprecation

When a library deprecates functions, a codemod can update all call sites:

# Transform: requests.get(url, verify=False)
# Into:     requests.get(url, verify=False)  + deprecation warning
# Or:       httpx.get(url, verify=False)     for library migration

Code Style Enforcement

Beyond what formatters like Black handle, codemods can enforce structural conventions:

  • Convert class-based views to function-based views
  • Replace manual __init__ methods with @dataclass
  • Convert try/except patterns to context managers
  • Add type annotations based on runtime analysis

2to3: The Original Python Codemod

Python shipped 2to3 to help migrate code from Python 2 to Python 3. It was one of the first large-scale automated code transformation tools in the Python ecosystem. While it used the older lib2to3 parser, it demonstrated the value of structural codemods.

2to3 handled dozens of transformations: print statements to functions, unicode to str, dict.keys() returning views, except Exception, e to except Exception as e, and many more.

The Transformation Pipeline

Most source-to-source tools follow this pipeline:

  1. Parse — Convert source text into a syntax tree (CST or enriched AST)
  2. Match — Identify nodes that need transformation (pattern matching)
  3. Transform — Replace, modify, or rearrange matched nodes
  4. Serialize — Convert the modified tree back to source text
  5. Validate — Optionally compile the output to verify correctness

The match step is crucial. libcst provides matchers — a declarative pattern-matching system:

import libcst.matchers as m

# Match any call to print with exactly one argument
pattern = m.Call(func=m.Name("print"), args=[m.Arg()])

Limitations and Risks

Semantic correctness is not guaranteed. A codemod that renames a function will happily rename a different function with the same name in a different module. Adding scope-awareness requires type information that CST tools do not provide on their own.

Edge cases are inevitable. No matter how well-designed a codemod is, some code patterns will be unusual enough to break the transformation. Always review diffs after running codemods.

Formatting drift. Even CST-preserving tools may produce slightly different formatting in modified sections. Running a formatter (Black) after the codemod ensures consistency.

A Common Misconception

Source-to-source transformation is not the same as code generation. Code generation creates new code from scratch (from schemas, configurations, etc.). Source-to-source transformation takes existing code and modifies it. They are complementary techniques — you might generate code initially, then use codemods to keep it updated as conventions evolve.

One thing to remember: Source-to-source transformers parse code into a tree that preserves formatting, apply targeted changes, and write it back as clean Python. They enable automated migrations, large-scale refactoring, and consistent API updates across codebases of any size — safely and precisely, because they understand code structure rather than just matching text.

pythonmetaprogramminglanguage-implementation

See Also

  • Python Code Generation Patterns When your Python program writes other programs — like a chef who invents new recipes instead of just cooking.
  • Ci Cd Why big apps can ship updates every day without turning your phone into a glitchy mess — CI/CD is the behind-the-scenes quality gate and delivery truck.
  • Containerization Why does software that works on your computer break on everyone else's? Containers fix that — and they're why Netflix can deploy 100 updates a day without the site going down.
  • Python 310 New Features Python 3.10 gave programmers a shape-sorting machine, friendlier error messages, and cleaner ways to say 'this or that' in type hints.
  • Python 311 New Features Python 3.11 made everything faster, error messages smarter, and let you catch several mistakes at once instead of stopping at the first one.