Dynamic Programming — Deep Dive
Technical framing
Dynamic programming reduces exponential-time problems to polynomial time by exploiting overlapping subproblems. The art lies in state design: choosing the minimal representation that captures all decision-relevant information. A poorly chosen state leads to unnecessary dimensions and prohibitive memory usage. A well-chosen state collapses the problem into something tractable.
This deep dive covers advanced DP patterns that go beyond textbook examples — the techniques that separate fluent problem-solvers from beginners.
State design principles
The state must satisfy the Markov property
The optimal decision at any state must depend only on the current state, not on how you arrived there. If your recurrence needs to “remember” the path, your state is missing a dimension.
Minimize state dimensions
Every dimension multiplies the table size. For a 2D problem with dimensions m × n, adding a third dimension k creates m × n × k states. Ask: “Can I fold this information into fewer dimensions?”
Common state patterns
| Pattern | State representation | Example |
|---|---|---|
| Linear | dp[i] = answer for prefix of length i | Longest increasing subsequence |
| Two-sequence | dp[i][j] = answer for first i of seq1, first j of seq2 | Edit distance |
| Knapsack | dp[i][w] = answer using first i items with capacity w | 0/1 knapsack |
| Interval | dp[i][j] = answer for subarray from i to j | Matrix chain multiplication |
| Bitmask | dp[mask] = answer for subset represented by bitmask | Traveling salesman |
| Tree | dp[node][state] = answer for subtree rooted at node | Tree coloring |
Advanced patterns
Edit distance (Levenshtein)
The foundation of spell checkers, DNA alignment, and fuzzy matching:
def edit_distance(s1: str, s2: str) -> int:
m, n = len(s1), len(s2)
dp = [[0] * (n + 1) for _ in range(m + 1)]
for i in range(m + 1):
dp[i][0] = i
for j in range(n + 1):
dp[0][j] = j
for i in range(1, m + 1):
for j in range(1, n + 1):
if s1[i - 1] == s2[j - 1]:
dp[i][j] = dp[i - 1][j - 1]
else:
dp[i][j] = 1 + min(
dp[i - 1][j], # delete
dp[i][j - 1], # insert
dp[i - 1][j - 1] # replace
)
return dp[m][n]
Space-optimized to O(n) since each row only depends on the previous row:
def edit_distance_optimized(s1: str, s2: str) -> int:
m, n = len(s1), len(s2)
prev = list(range(n + 1))
for i in range(1, m + 1):
curr = [i] + [0] * n
for j in range(1, n + 1):
if s1[i - 1] == s2[j - 1]:
curr[j] = prev[j - 1]
else:
curr[j] = 1 + min(prev[j], curr[j - 1], prev[j - 1])
prev = curr
return prev[n]
Longest increasing subsequence (LIS) with patience sorting
The classic O(n²) DP solution:
def lis_quadratic(arr: list[int]) -> int:
n = len(arr)
dp = [1] * n
for i in range(1, n):
for j in range(i):
if arr[j] < arr[i]:
dp[i] = max(dp[i], dp[j] + 1)
return max(dp)
The O(n log n) solution using binary search (patience sorting):
import bisect
def lis_nlogn(arr: list[int]) -> int:
tails = [] # tails[i] = smallest tail of all increasing subsequences of length i+1
for num in arr:
pos = bisect.bisect_left(tails, num)
if pos == len(tails):
tails.append(num)
else:
tails[pos] = num
return len(tails)
The tails array maintains the invariant that it is always sorted, enabling binary search. Each element is processed in O(log n).
Bitmask DP: Traveling Salesman Problem
For problems involving subsets of N elements (N ≤ 20), represent subsets as bitmasks:
from functools import lru_cache
def tsp(dist: list[list[float]]) -> float:
n = len(dist)
ALL_VISITED = (1 << n) - 1
@lru_cache(maxsize=None)
def solve(mask: int, pos: int) -> float:
if mask == ALL_VISITED:
return dist[pos][0] # return to start
best = float('inf')
for city in range(n):
if mask & (1 << city):
continue
new_mask = mask | (1 << city)
cost = dist[pos][city] + solve(new_mask, city)
best = min(best, cost)
return best
return solve(1, 0) # start at city 0, mark it visited
State space: O(2ⁿ × n). This is exponential but dramatically better than the brute-force O(n!) permutation check.
Interval DP: Matrix chain multiplication
Determine the optimal parenthesization to minimize scalar multiplications:
def matrix_chain_order(dimensions: list[int]) -> int:
n = len(dimensions) - 1 # number of matrices
dp = [[0] * n for _ in range(n)]
for length in range(2, n + 1): # chain length
for i in range(n - length + 1):
j = i + length - 1
dp[i][j] = float('inf')
for k in range(i, j):
cost = (dp[i][k] + dp[k + 1][j] +
dimensions[i] * dimensions[k + 1] * dimensions[j + 1])
dp[i][j] = min(dp[i][j], cost)
return dp[0][n - 1]
The key insight: iterate by chain length (small intervals before large ones) so that all subproblems are solved before they are needed.
DP on trees
For tree-structured problems, process children before parents (post-order):
def max_independent_set(adj: dict, root: int) -> int:
"""Maximum weight independent set on a tree."""
dp_include = {}
dp_exclude = {}
def dfs(node: int, parent: int):
dp_include[node] = 1 # include this node
dp_exclude[node] = 0 # exclude this node
for child in adj.get(node, []):
if child == parent:
continue
dfs(child, node)
dp_include[node] += dp_exclude[child]
dp_exclude[node] += max(dp_include[child], dp_exclude[child])
dfs(root, -1)
return max(dp_include[root], dp_exclude[root])
Optimization techniques
Monotonic queue optimization
When the DP transition looks like dp[i] = min(dp[j] + cost(j, i)) for j in a sliding window, a monotonic deque reduces the inner loop from O(k) to O(1) amortized:
from collections import deque
def max_sum_subarray_with_constraint(arr: list[int], k: int) -> int:
"""Max subarray sum where subarray length is at most k."""
n = len(arr)
prefix = [0] * (n + 1)
for i in range(n):
prefix[i + 1] = prefix[i] + arr[i]
dq = deque([0])
result = float('-inf')
for i in range(1, n + 1):
# Remove elements outside window
while dq and dq[0] < i - k:
dq.popleft()
result = max(result, prefix[i] - prefix[dq[0]])
# Maintain monotonically increasing prefix sums
while dq and prefix[dq[-1]] >= prefix[i]:
dq.pop()
dq.append(i)
return result
Convex hull trick
When the DP transition has the form dp[i] = min(dp[j] + b[j] * a[i]) where a is sorted, the convex hull trick reduces the transition from O(n) to O(1) amortized. This is advanced but crucial for competitive programming problems with 10⁵+ states.
Python-specific considerations
lru_cache vs. manual table
# lru_cache: clean, Pythonic, good for top-down
@lru_cache(maxsize=None)
def dp(i, j):
...
# Manual table: faster for bottom-up, controllable memory
dp = [[0] * (n + 1) for _ in range(m + 1)]
For problems with > 10⁶ states, manual tables are significantly faster because lru_cache has per-call overhead from hashing and dictionary lookups.
Recursion limits
Python’s default recursion limit is 1000. For top-down DP on large inputs:
import sys
sys.setrecursionlimit(10**6) # use with caution
Or convert to iterative bottom-up, which has no depth limit and is generally 2-5x faster in Python.
NumPy acceleration for grid DP
For 2D grid problems, NumPy can parallelize transitions:
import numpy as np
def min_path_sum_numpy(grid):
dp = np.array(grid, dtype=np.int64)
dp[0] = np.cumsum(dp[0])
dp[:, 0] = np.cumsum(dp[:, 0])
for i in range(1, dp.shape[0]):
for j in range(1, dp.shape[1]):
dp[i, j] += min(dp[i-1, j], dp[i, j-1])
return int(dp[-1, -1])
For truly large grids, the row-by-row computation can use vectorized operations on each row.
Debugging DP
- Print the table — visualize the DP table for small inputs. Most bugs become obvious.
- Verify base cases — wrong base cases propagate errors through the entire table.
- Check transition direction — bottom-up must process dependencies before dependents.
- Test against brute force — for small inputs, verify DP results against exhaustive search.
def debug_dp_table(dp, row_labels=None, col_labels=None):
import pandas as pd
df = pd.DataFrame(dp, index=row_labels, columns=col_labels)
print(df.to_string())
One thing to remember: The hardest part of DP is not coding the solution — it is defining the state. Spend 80% of your time on “what does dp[i][j] represent?” and the recurrence and code will follow naturally.
See Also
- Python Backtracking Algorithms How computers solve puzzles by trying every path, backing up when stuck, and never giving up until they find the answer.
- Python Big O Complexity Analysis Why some programs finish in a blink while others take forever — explained with pizza delivery and toy cleanup.
- Python Binary Search Implementation Find anything in a sorted list insanely fast using the same trick you already use with dictionaries and phone books.
- Python Graph Algorithms How computers navigate maps, friendships, and connections using dots and lines — explained without any math.
- Python Greedy Algorithms Why always grabbing the biggest piece sometimes gives you the best answer — and sometimes does not.