Source-to-Source Transformers — Core Concepts
What Is Source-to-Source Transformation?
A source-to-source transformer (also called a transpiler or codemod tool) reads source code in one form, modifies it according to rules, and writes it back as source code. Unlike a compiler that produces bytecode or machine code, the output is still human-readable Python.
The key difference from text-based find-and-replace is structural awareness. These tools parse code into a tree, operate on the tree, and convert it back to text — ensuring that changes respect the language’s grammar and semantics.
Why Not Just Use ast?
Python’s built-in ast module works well for analysis and transformation, but it has a critical limitation: it discards formatting information. Comments, whitespace, trailing commas, and blank lines are all lost. When you ast.unparse() a modified tree, the output is functional but looks nothing like the original code.
For transformations where the output must look like human-written code — readable diffs, maintained style, preserved comments — you need a Concrete Syntax Tree (CST) that preserves every detail of the original formatting.
libcst: The Modern Standard
libcst is the leading Python CST library. It parses source into a tree that preserves all whitespace, comments, and formatting, then lets you transform nodes while keeping everything else intact.
import libcst as cst
class RenameFunction(cst.CSTTransformer):
def leave_Call(self, original_node, updated_node):
if isinstance(updated_node.func, cst.Name):
if updated_node.func.value == "old_function":
return updated_node.with_changes(
func=cst.Name("new_function")
)
return updated_node
source = 'result = old_function(x, y) # important call\n'
tree = cst.parse_module(source)
modified = tree.visit(RenameFunction())
print(modified.code)
# result = new_function(x, y) # important call
Notice the comment is preserved — something ast could never do.
Common Transformation Use Cases
Python Version Migration
When upgrading Python versions, certain patterns become obsolete or gain better alternatives. Codemods automate these migrations:
dict.has_key(x)→x in dictprint "hello"→print("hello")isinstance(x, (str, unicode))→isinstance(x, str)- Old-style
%formatting → f-strings typing.Optional[X]→X | None(Python 3.10+)
API Deprecation
When a library deprecates functions, a codemod can update all call sites:
# Transform: requests.get(url, verify=False)
# Into: requests.get(url, verify=False) + deprecation warning
# Or: httpx.get(url, verify=False) for library migration
Code Style Enforcement
Beyond what formatters like Black handle, codemods can enforce structural conventions:
- Convert class-based views to function-based views
- Replace manual
__init__methods with@dataclass - Convert
try/exceptpatterns to context managers - Add type annotations based on runtime analysis
2to3: The Original Python Codemod
Python shipped 2to3 to help migrate code from Python 2 to Python 3. It was one of the first large-scale automated code transformation tools in the Python ecosystem. While it used the older lib2to3 parser, it demonstrated the value of structural codemods.
2to3 handled dozens of transformations: print statements to functions, unicode to str, dict.keys() returning views, except Exception, e to except Exception as e, and many more.
The Transformation Pipeline
Most source-to-source tools follow this pipeline:
- Parse — Convert source text into a syntax tree (CST or enriched AST)
- Match — Identify nodes that need transformation (pattern matching)
- Transform — Replace, modify, or rearrange matched nodes
- Serialize — Convert the modified tree back to source text
- Validate — Optionally compile the output to verify correctness
The match step is crucial. libcst provides matchers — a declarative pattern-matching system:
import libcst.matchers as m
# Match any call to print with exactly one argument
pattern = m.Call(func=m.Name("print"), args=[m.Arg()])
Limitations and Risks
Semantic correctness is not guaranteed. A codemod that renames a function will happily rename a different function with the same name in a different module. Adding scope-awareness requires type information that CST tools do not provide on their own.
Edge cases are inevitable. No matter how well-designed a codemod is, some code patterns will be unusual enough to break the transformation. Always review diffs after running codemods.
Formatting drift. Even CST-preserving tools may produce slightly different formatting in modified sections. Running a formatter (Black) after the codemod ensures consistency.
A Common Misconception
Source-to-source transformation is not the same as code generation. Code generation creates new code from scratch (from schemas, configurations, etc.). Source-to-source transformation takes existing code and modifies it. They are complementary techniques — you might generate code initially, then use codemods to keep it updated as conventions evolve.
One thing to remember: Source-to-source transformers parse code into a tree that preserves formatting, apply targeted changes, and write it back as clean Python. They enable automated migrations, large-scale refactoring, and consistent API updates across codebases of any size — safely and precisely, because they understand code structure rather than just matching text.
See Also
- Python Code Generation Patterns When your Python program writes other programs — like a chef who invents new recipes instead of just cooking.
- Ci Cd Why big apps can ship updates every day without turning your phone into a glitchy mess — CI/CD is the behind-the-scenes quality gate and delivery truck.
- Containerization Why does software that works on your computer break on everyone else's? Containers fix that — and they're why Netflix can deploy 100 updates a day without the site going down.
- Python 310 New Features Python 3.10 gave programmers a shape-sorting machine, friendlier error messages, and cleaner ways to say 'this or that' in type hints.
- Python 311 New Features Python 3.11 made everything faster, error messages smarter, and let you catch several mistakes at once instead of stopping at the first one.