Recursion Deep Dive — Deep Dive

Master tail recursion, mutual recursion, trampolining, and stack frame internals — with Python implementations and performance analysis.

Call stack internals in CPython

When Python executes a function call, it creates a PyFrameObject containing the local variables, the instruction pointer, and a reference to the enclosing frame. These frames form a linked list — the call stack. You can inspect them:

import sys

def show_frames(n):
    if n == 0:
        frame = sys._getframe()
        depth = 0
        while frame:
            print(f"Frame {depth}: {frame.f_code.co_name} "
                  f"(line {frame.f_lineno})")
            frame = frame.f_back
            depth += 1
        return
    show_frames(n - 1)

show_frames(3)

Each frame costs approximately 400-500 bytes in CPython 3.11+. With the default recursion limit of 1,000, that is about 500 KB of stack space. Python 3.12 introduced lighter “inline frames” for some calls, but recursive calls still create full frame objects.

Tail recursion and why Python ignores it

A tail call is a function call that is the very last thing a function does before returning. A function is tail recursive if its recursive call is a tail call.

# NOT tail recursive — multiplication happens AFTER the recursive call returns
def factorial(n):
    if n <= 1:
        return 1
    return n * factorial(n - 1)

# Tail recursive — the recursive call IS the return value
def factorial_tail(n, accumulator=1):
    if n <= 1:
        return accumulator
    return factorial_tail(n - 1, accumulator * n)

Languages like Scheme and Scala optimize tail calls by reusing the current stack frame — tail call optimization (TCO). This makes tail recursion use O(1) stack space.

Python deliberately does not implement TCO. Guido van Rossum’s reasoning: it destroys stack traces, making debugging harder. The Pythonic solution is to use a loop when you need deep recursion.

Trampolining: fake TCO in Python

A trampoline is a loop that repeatedly calls the function returned by the previous call. It converts stack depth into loop iterations:

def trampoline(fn, *args):
    result = fn(*args)
    while callable(result):
        result = result()
    return result

def factorial_tramp(n, acc=1):
    if n <= 1:
        return acc
    return lambda: factorial_tramp(n - 1, acc * n)

# Works for any depth — no stack growth
trampoline(factorial_tramp, 100_000)

The trampoline pattern is useful when you must use recursion conceptually but cannot afford the stack depth. Each “recursive call” returns a thunk (zero-argument lambda) instead of actually recursing. The trampoline loop then calls the thunk, keeping the stack flat.

Mutual recursion

When two or more functions call each other:

def is_even(n):
    if n == 0:
        return True
    return is_odd(n - 1)

def is_odd(n):
    if n == 0:
        return False
    return is_even(n - 1)

This is conceptually elegant but uses 2× the stack depth. Practical applications include recursive descent parsers where different grammar rules call each other:

class Parser:
    def __init__(self, tokens):
        self.tokens = tokens
        self.pos = 0

    def parse_expression(self):
        left = self.parse_term()
        while self.pos < len(self.tokens) and self.tokens[self.pos] in ('+', '-'):
            op = self.tokens[self.pos]
            self.pos += 1
            right = self.parse_term()
            left = (op, left, right)
        return left

    def parse_term(self):
        left = self.parse_factor()
        while self.pos < len(self.tokens) and self.tokens[self.pos] in ('*', '/'):
            op = self.tokens[self.pos]
            self.pos += 1
            right = self.parse_factor()
            left = (op, left, right)
        return left

    def parse_factor(self):
        if self.tokens[self.pos] == '(':
            self.pos += 1  # skip '('
            expr = self.parse_expression()  # mutual recursion
            self.pos += 1  # skip ')'
            return expr
        val = int(self.tokens[self.pos])
        self.pos += 1
        return val

Converting recursion to iteration

Single-branch recursion → loop

When a function makes exactly one recursive call, convert directly to a loop:

# Recursive
def gcd(a, b):
    if b == 0:
        return a
    return gcd(b, a % b)

# Iterative
def gcd_iter(a, b):
    while b != 0:
        a, b = b, a % b
    return a

Multi-branch recursion → explicit stack

When a function makes multiple recursive calls (tree recursal), use a stack:

# Recursive tree traversal
def inorder(node):
    if node is None:
        return []
    return inorder(node.left) + [node.val] + inorder(node.right)

# Iterative with explicit stack
def inorder_iter(root):
    result = []
    stack = []
    current = root
    while current or stack:
        while current:
            stack.append(current)
            current = current.left
        current = stack.pop()
        result.append(current.val)
        current = current.right
    return result

Recursion with state → stack of frames

For complex recursion where you need to resume computation after a recursive call returns:

# Recursive quicksort
def quicksort(arr, lo, hi):
    if lo < hi:
        pivot = partition(arr, lo, hi)
        quicksort(arr, lo, pivot - 1)
        quicksort(arr, pivot + 1, hi)

# Iterative quicksort
def quicksort_iter(arr, lo, hi):
    stack = [(lo, hi)]
    while stack:
        lo, hi = stack.pop()
        if lo < hi:
            pivot = partition(arr, lo, hi)
            stack.append((lo, pivot - 1))
            stack.append((pivot + 1, hi))

Recursion depth analysis

The recursion depth determines stack space usage. Common depths:

Algorithm	Depth	Space
Binary search	O(log n)	~20 frames for n=10⁶
Merge sort	O(log n)	~20 frames for n=10⁶
Tree traversal (balanced)	O(log n)	~20 frames for n=10⁶
Tree traversal (skewed)	O(n)	n frames — danger
Quicksort (balanced)	O(log n)	~20 frames
Quicksort (worst case)	O(n)	n frames — danger
Linear recursion (factorial)	O(n)	n frames

Python’s default limit of 1,000 means any O(n) depth recursion fails for n > 1,000. This is not a flaw — it is a signal to use iteration.

Memoized recursion vs bottom-up DP

Top-down memoized recursion and bottom-up dynamic programming compute the same results. The tradeoffs:

# Top-down (memoized recursion)
from functools import lru_cache

@lru_cache(maxsize=None)
def fib_top_down(n):
    if n <= 1:
        return n
    return fib_top_down(n - 1) + fib_top_down(n - 2)

# Bottom-up (iterative DP)
def fib_bottom_up(n):
    if n <= 1:
        return n
    dp = [0] * (n + 1)
    dp[1] = 1
    for i in range(2, n + 1):
        dp[i] = dp[i-1] + dp[i-2]
    return dp[n]

Aspect	Top-down	Bottom-up
Stack depth	O(n)	O(1)
Memory overhead	Cache + frames	Array only
Subproblems computed	Only needed ones	All in range
Implementation	Often simpler	Sometimes tricky
Python suitability	Limit ~1000 depth	No limit

For problems where not all subproblems are needed (sparse dependency graph), top-down can be faster. For problems where all subproblems are needed (dense grid DP), bottom-up avoids stack overhead.

Advanced: continuation-passing style

CPS transforms every function so that instead of returning, it calls a continuation function with the result:

def factorial_cps(n, cont=lambda x: x):
    if n <= 1:
        return cont(1)
    return factorial_cps(n - 1, lambda result: cont(n * result))

This makes all calls tail calls, enabling trampolining:

def factorial_cps_tramp(n, cont=lambda x: x):
    if n <= 1:
        return cont(1)
    return lambda: factorial_cps_tramp(
        n - 1,
        lambda result, n=n, cont=cont: cont(n * result)
    )

result = trampoline(factorial_cps_tramp, 50_000)

CPS is powerful but produces hard-to-read Python. Reserve it for cases where you need deep recursion and cannot restructure the algorithm iteratively.

Performance benchmarks

Measuring the real cost of recursion in CPython 3.11:

import timeit

def recursive_sum(n):
    if n == 0:
        return 0
    return n + recursive_sum(n - 1)

def iterative_sum(n):
    total = 0
    for i in range(n + 1):
        total += i
    return total

n = 900  # stay under recursion limit

t_rec = timeit.timeit(lambda: recursive_sum(n), number=1000)
t_iter = timeit.timeit(lambda: iterative_sum(n), number=1000)

print(f"Recursive: {t_rec:.3f}s")
print(f"Iterative: {t_iter:.3f}s")
print(f"Overhead: {t_rec/t_iter:.1f}x")

Typical results show recursion is 3-8× slower than equivalent iteration in CPython, due to frame creation, argument passing, and function call overhead. In PyPy, the gap shrinks to 1.2-2× because the JIT compiler optimizes call overhead.

When recursion is the right choice

Despite the overhead, recursion remains the cleanest approach for:

Tree and graph algorithms — DFS is naturally recursive; the iterative version with an explicit stack is harder to read and maintain.
Divide and conquer — merge sort, quicksort, and Karatsuba multiplication express more clearly as recursive algorithms.
Backtracking — generating permutations, solving constraint satisfaction problems, and game-tree search are dramatically simpler with recursion.
Parsing — recursive descent parsers map grammar rules to functions, making the code mirror the language specification.

For everything else — especially linear sequences — use a loop.

One thing to remember: Recursion is a thinking tool, not just a coding technique. Even when you implement iteratively, designing the solution recursively first often reveals the correct structure. Think recursively, then convert to iteration if performance or depth demands it.

pythonrecursionalgorithmsinternals