PLY Parser Generator — Core Concepts

What Is PLY?

PLY (Python Lex-Yacc) is a pure-Python implementation of the classic compiler construction tools lex and yacc. Created by David Beazley, it lets you define tokenizers and parsers using Python functions and docstrings rather than separate grammar files. PLY generates LALR(1) parsers, the same class of parsers used by many production compilers.

The Two-Phase Architecture

Language processing in PLY happens in two distinct stages:

Lexical Analysis (Lexer): The lexer reads raw text and produces a stream of tokens. Each token has a type (like NUMBER, PLUS, IDENTIFIER) and a value. You define token patterns using regular expressions attached to Python functions.

Syntactic Analysis (Parser): The parser takes the token stream and checks it against grammar rules you define. When tokens match a rule, the parser executes an associated action — typically building a data structure that represents the meaning of the input.

Defining a Lexer

Tokens are defined in a module with a tokens tuple and functions or strings prefixed with t_:

import ply.lex as lex

tokens = ('NUMBER', 'PLUS', 'MINUS', 'LPAREN', 'RPAREN')

t_PLUS   = r'\+'
t_MINUS  = r'-'
t_LPAREN = r'\('
t_RPAREN = r'\)'

def t_NUMBER(t):
    r'\d+'
    t.value = int(t.value)
    return t

t_ignore = ' \t'

def t_error(t):
    print(f"Illegal character '{t.value[0]}'")
    t.lexer.skip(1)

lexer = lex.lex()

Simple tokens use string assignments with regex patterns. Complex tokens (like NUMBER, where you want to convert the value) use functions whose docstrings contain the regex.

Defining a Parser

Grammar rules are written as functions prefixed with p_. The docstring contains the BNF-style production rule:

import ply.yacc as yacc

def p_expression_binop(p):
    '''expression : expression PLUS expression
                  | expression MINUS expression'''
    if p[2] == '+':
        p[0] = p[1] + p[3]
    elif p[2] == '-':
        p[0] = p[1] - p[3]

def p_expression_group(p):
    '''expression : LPAREN expression RPAREN'''
    p[0] = p[2]

def p_expression_number(p):
    '''expression : NUMBER'''
    p[0] = p[1]

def p_error(p):
    print("Syntax error")

parser = yacc.yacc()
result = parser.parse("3 + 5 - (2 + 1)")

The p parameter is a sequence where p[0] is the result and p[1], p[2], etc., correspond to the symbols on the right side of the production rule.

Precedence and Associativity

PLY handles operator precedence through a precedence tuple, listed from lowest to highest priority:

precedence = (
    ('left', 'PLUS', 'MINUS'),
    ('left', 'TIMES', 'DIVIDE'),
    ('right', 'UMINUS'),
)

This eliminates ambiguity: 2 + 3 * 4 correctly evaluates to 14, not 20.

Common Misconception

Many developers assume PLY is outdated because it mirrors 1970s-era tools. In reality, LALR(1) parsing remains one of the most efficient parsing strategies. PLY generates parsing tables at build time and caches them, making subsequent runs fast. For well-defined grammars with clear precedence rules, PLY is often simpler than newer alternatives.

When to Use PLY

PLY excels when you need a parser that closely follows formal grammar specifications. It is the natural choice if you are following a compilers textbook or implementing a language with a published BNF grammar. For quick DSLs or formats where a PEG parser feels more natural, tools like Lark may be easier.

One thing to remember: PLY mirrors the classic lex/yacc workflow — define tokens with regex, define grammar rules with BNF in docstrings, and let PLY generate the efficient parsing tables behind the scenes.

pythonparsingcompiler-tools

See Also

  • Python Antlr4 Python How ANTLR4 lets you write one set of language rules and use them in Python, Java, or any language — like a universal grammar book.
  • Python Lark Parsing Library How Lark helps Python understand any text format you throw at it — like giving your program a universal translator.
  • Ci Cd Why big apps can ship updates every day without turning your phone into a glitchy mess — CI/CD is the behind-the-scenes quality gate and delivery truck.
  • Containerization Why does software that works on your computer break on everyone else's? Containers fix that — and they're why Netflix can deploy 100 updates a day without the site going down.
  • Python 310 New Features Python 3.10 gave programmers a shape-sorting machine, friendlier error messages, and cleaner ways to say 'this or that' in type hints.