Python ast Module for Code Analysis — Core Concepts

Why the ast module matters

Every serious Python tool that needs to understand code — linters, type checkers, security scanners, code formatters — works with abstract syntax trees. The ast module is the standard library’s interface to CPython’s parser, giving you the same tree the interpreter builds internally before executing code.

Parsing code into a tree

The entry point is ast.parse(), which takes a string of Python source code and returns a tree of node objects:

import ast

tree = ast.parse("x = 2 + 3")

The returned object is an ast.Module node whose body attribute contains a list of statements. In this case, there is one ast.Assign node with a targets list (containing ast.Name(id='x')) and a value (an ast.BinOp node with left=ast.Constant(value=2), op=ast.Add(), right=ast.Constant(value=3)).

Understanding node types

AST nodes fall into three broad categories:

Statements — things that do something: ast.Assign, ast.FunctionDef, ast.If, ast.For, ast.Import, ast.Return. These appear in the body of modules, functions, and control structures.

Expressions — things that produce a value: ast.BinOp, ast.Call, ast.Name, ast.Constant, ast.Attribute, ast.Subscript. These appear as children of statements.

Operators and contexts — structural markers: ast.Add, ast.Load, ast.Store. These classify how an expression is used.

Walking the tree

Two built-in approaches exist for traversing AST nodes:

ast.walk() — unordered traversal

ast.walk(tree) yields every node in the tree in no guaranteed order. Use it when you need to find all occurrences of something regardless of position:

# Count all function definitions in a file
function_count = sum(
    1 for node in ast.walk(tree) if isinstance(node, ast.FunctionDef)
)

NodeVisitor — structured traversal

For order-sensitive analysis, subclass ast.NodeVisitor and define visit_* methods:

class ImportFinder(ast.NodeVisitor):
    def __init__(self):
        self.imports = []

    def visit_Import(self, node):
        for alias in node.names:
            self.imports.append(alias.name)
        self.generic_visit(node)  # Continue walking children

    def visit_ImportFrom(self, node):
        self.imports.append(node.module)
        self.generic_visit(node)

The generic_visit call is important — without it, children of the matched node are skipped.

Transforming code with NodeTransformer

ast.NodeTransformer extends NodeVisitor with the ability to replace nodes. Return a new node from a visit_* method to substitute it, or return None to delete it:

class ConstantFolder(ast.NodeTransformer):
    def visit_BinOp(self, node):
        self.generic_visit(node)  # Transform children first
        if (isinstance(node.left, ast.Constant)
            and isinstance(node.right, ast.Constant)
            and isinstance(node.op, ast.Add)):
            return ast.Constant(value=node.left.value + node.right.value)
        return node

After transforming, call ast.fix_missing_locations(tree) to fill in line numbers for new nodes, then compile(tree, "<string>", "exec") to get executable bytecode.

Source location information

Every AST node has lineno, col_offset, end_lineno, and end_col_offset attributes (Python 3.8+). These point back to the exact position in the original source code. Tools like linters use these to produce error messages with precise locations.

Common misconception

A common mistake is thinking the AST preserves comments and whitespace. It does not. The abstract syntax tree only represents the semantic structure. If you need to preserve formatting (for a code formatter or refactoring tool), you need a concrete syntax tree (CST) from tools like lib2to3 or the third-party libcst library.

The ast.dump() helper

For debugging, ast.dump(tree, indent=2) prints a readable representation of the tree. This is invaluable when you are figuring out what node types and attributes correspond to a particular piece of syntax.

Practical applications

  • Linting: Walk the tree to check coding standards without executing code
  • Security scanning: Find calls to eval(), exec(), or os.system()
  • Dependency extraction: Collect all import and from ... import statements
  • Code metrics: Count functions, classes, nesting depth, cyclomatic complexity
  • Auto-refactoring: Replace deprecated API calls with modern equivalents

The one thing to remember: The ast module parses Python into a tree of typed nodes that you can walk with NodeVisitor or transform with NodeTransformer, making it the foundation for any tool that needs to understand or modify Python code without executing it.

pythonmetaprogrammingcode-analysis

See Also

  • Python Dis Module Bytecode How Python's dis module lets you peek at the secret instructions your computer actually runs when it executes your Python code.
  • Python Gc Module Internals How Python's garbage collector automatically cleans up memory you are no longer using — like a tidy roommate for your program.
  • Python Importlib Custom Loaders How Python's importlib lets you teach Python to load code from anywhere — databases, zip files, the internet, or even generated on the fly.
  • Python Site Customization How Python's site module sets up your environment before your code even starts running — the invisible first step of every Python program.
  • Python Startup Optimization Why Python takes a moment to start and what you can do to make your scripts and tools launch faster.