Python ast Module for Code Analysis — Core Concepts
Why the ast module matters
Every serious Python tool that needs to understand code — linters, type checkers, security scanners, code formatters — works with abstract syntax trees. The ast module is the standard library’s interface to CPython’s parser, giving you the same tree the interpreter builds internally before executing code.
Parsing code into a tree
The entry point is ast.parse(), which takes a string of Python source code and returns a tree of node objects:
import ast
tree = ast.parse("x = 2 + 3")
The returned object is an ast.Module node whose body attribute contains a list of statements. In this case, there is one ast.Assign node with a targets list (containing ast.Name(id='x')) and a value (an ast.BinOp node with left=ast.Constant(value=2), op=ast.Add(), right=ast.Constant(value=3)).
Understanding node types
AST nodes fall into three broad categories:
Statements — things that do something: ast.Assign, ast.FunctionDef, ast.If, ast.For, ast.Import, ast.Return. These appear in the body of modules, functions, and control structures.
Expressions — things that produce a value: ast.BinOp, ast.Call, ast.Name, ast.Constant, ast.Attribute, ast.Subscript. These appear as children of statements.
Operators and contexts — structural markers: ast.Add, ast.Load, ast.Store. These classify how an expression is used.
Walking the tree
Two built-in approaches exist for traversing AST nodes:
ast.walk() — unordered traversal
ast.walk(tree) yields every node in the tree in no guaranteed order. Use it when you need to find all occurrences of something regardless of position:
# Count all function definitions in a file
function_count = sum(
1 for node in ast.walk(tree) if isinstance(node, ast.FunctionDef)
)
NodeVisitor — structured traversal
For order-sensitive analysis, subclass ast.NodeVisitor and define visit_* methods:
class ImportFinder(ast.NodeVisitor):
def __init__(self):
self.imports = []
def visit_Import(self, node):
for alias in node.names:
self.imports.append(alias.name)
self.generic_visit(node) # Continue walking children
def visit_ImportFrom(self, node):
self.imports.append(node.module)
self.generic_visit(node)
The generic_visit call is important — without it, children of the matched node are skipped.
Transforming code with NodeTransformer
ast.NodeTransformer extends NodeVisitor with the ability to replace nodes. Return a new node from a visit_* method to substitute it, or return None to delete it:
class ConstantFolder(ast.NodeTransformer):
def visit_BinOp(self, node):
self.generic_visit(node) # Transform children first
if (isinstance(node.left, ast.Constant)
and isinstance(node.right, ast.Constant)
and isinstance(node.op, ast.Add)):
return ast.Constant(value=node.left.value + node.right.value)
return node
After transforming, call ast.fix_missing_locations(tree) to fill in line numbers for new nodes, then compile(tree, "<string>", "exec") to get executable bytecode.
Source location information
Every AST node has lineno, col_offset, end_lineno, and end_col_offset attributes (Python 3.8+). These point back to the exact position in the original source code. Tools like linters use these to produce error messages with precise locations.
Common misconception
A common mistake is thinking the AST preserves comments and whitespace. It does not. The abstract syntax tree only represents the semantic structure. If you need to preserve formatting (for a code formatter or refactoring tool), you need a concrete syntax tree (CST) from tools like lib2to3 or the third-party libcst library.
The ast.dump() helper
For debugging, ast.dump(tree, indent=2) prints a readable representation of the tree. This is invaluable when you are figuring out what node types and attributes correspond to a particular piece of syntax.
Practical applications
- Linting: Walk the tree to check coding standards without executing code
- Security scanning: Find calls to
eval(),exec(), oros.system() - Dependency extraction: Collect all
importandfrom ... importstatements - Code metrics: Count functions, classes, nesting depth, cyclomatic complexity
- Auto-refactoring: Replace deprecated API calls with modern equivalents
The one thing to remember: The ast module parses Python into a tree of typed nodes that you can walk with NodeVisitor or transform with NodeTransformer, making it the foundation for any tool that needs to understand or modify Python code without executing it.
See Also
- Python Dis Module Bytecode How Python's dis module lets you peek at the secret instructions your computer actually runs when it executes your Python code.
- Python Gc Module Internals How Python's garbage collector automatically cleans up memory you are no longer using — like a tidy roommate for your program.
- Python Importlib Custom Loaders How Python's importlib lets you teach Python to load code from anywhere — databases, zip files, the internet, or even generated on the fly.
- Python Site Customization How Python's site module sets up your environment before your code even starts running — the invisible first step of every Python program.
- Python Startup Optimization Why Python takes a moment to start and what you can do to make your scripts and tools launch faster.