Python SLAM Algorithms — Deep Dive
EKF-SLAM Implementation
The Extended Kalman Filter maintains a state vector and covariance matrix that jointly represent the robot pose and all landmark positions.
State Representation
For a 2D robot with N landmarks:
import numpy as np
# State: [x, y, theta, lm1_x, lm1_y, lm2_x, lm2_y, ...]
# Dimension: 3 + 2*N
def init_state(n_landmarks):
state = np.zeros(3 + 2 * n_landmarks)
covariance = np.eye(3 + 2 * n_landmarks) * 1000 # high initial uncertainty
covariance[:3, :3] = np.zeros((3, 3)) # robot starts at known origin
return state, covariance
Motion Update (Prediction)
When the robot moves, we predict the new state using the motion model and propagate uncertainty:
def predict(state, cov, velocity, angular_vel, dt):
theta = state[2]
n = len(state)
# Motion model (velocity model)
if abs(angular_vel) > 1e-6:
r = velocity / angular_vel
dx = -r * np.sin(theta) + r * np.sin(theta + angular_vel * dt)
dy = r * np.cos(theta) - r * np.cos(theta + angular_vel * dt)
dtheta = angular_vel * dt
else:
dx = velocity * np.cos(theta) * dt
dy = velocity * np.sin(theta) * dt
dtheta = 0
state[0] += dx
state[1] += dy
state[2] += dtheta
# Jacobian of motion model w.r.t. state
F = np.eye(n)
if abs(angular_vel) > 1e-6:
r = velocity / angular_vel
F[0, 2] = -r * np.cos(theta) + r * np.cos(theta + angular_vel * dt)
F[1, 2] = -r * np.sin(theta) + r * np.sin(theta + angular_vel * dt)
# Process noise (only affects robot pose)
Q = np.zeros((n, n))
Q[:3, :3] = np.diag([0.1, 0.1, 0.01]) * dt
cov = F @ cov @ F.T + Q
return state, cov
Measurement Update
When the robot observes a landmark (range and bearing measurement):
def update(state, cov, landmark_id, measured_range, measured_bearing):
lm_idx = 3 + 2 * landmark_id
lm_x, lm_y = state[lm_idx], state[lm_idx + 1]
rx, ry, rtheta = state[0], state[1], state[2]
dx = lm_x - rx
dy = lm_y - ry
q = dx**2 + dy**2
sqrt_q = np.sqrt(q)
# Expected measurement
expected_range = sqrt_q
expected_bearing = np.arctan2(dy, dx) - rtheta
# Innovation (difference between measured and expected)
innovation = np.array([
measured_range - expected_range,
normalize_angle(measured_bearing - expected_bearing)
])
# Measurement Jacobian
n = len(state)
H = np.zeros((2, n))
H[0, 0] = -dx / sqrt_q
H[0, 1] = -dy / sqrt_q
H[0, lm_idx] = dx / sqrt_q
H[0, lm_idx + 1] = dy / sqrt_q
H[1, 0] = dy / q
H[1, 1] = -dx / q
H[1, 2] = -1
H[1, lm_idx] = -dy / q
H[1, lm_idx + 1] = dx / q
# Measurement noise
R = np.diag([0.1, 0.05]) # range, bearing variances
# Kalman gain
S = H @ cov @ H.T + R
K = cov @ H.T @ np.linalg.inv(S)
state = state + K @ innovation
cov = (np.eye(n) - K @ H) @ cov
return state, cov
def normalize_angle(angle):
while angle > np.pi: angle -= 2 * np.pi
while angle < -np.pi: angle += 2 * np.pi
return angle
EKF-SLAM Limitations
The covariance matrix is (3 + 2N) × (3 + 2N). With 1000 landmarks, that is a 2003×2003 matrix updated at every observation. The O(N²) update cost makes EKF-SLAM impractical beyond small environments.
Pose Graph SLAM with GTSAM
Graph-based SLAM is the modern standard. Each robot pose is a node, and constraints (from odometry and loop closures) are edges. GTSAM (Georgia Tech Smoothing And Mapping) provides efficient implementations:
import gtsam
def build_pose_graph():
graph = gtsam.NonlinearFactorGraph()
initial = gtsam.Values()
# Prior on first pose
prior_noise = gtsam.noiseModel.Diagonal.Sigmas(np.array([0.1, 0.1, 0.05]))
graph.add(gtsam.PriorFactorPose2(0, gtsam.Pose2(0, 0, 0), prior_noise))
initial.insert(0, gtsam.Pose2(0, 0, 0))
# Odometry noise model
odom_noise = gtsam.noiseModel.Diagonal.Sigmas(np.array([0.2, 0.2, 0.1]))
# Add odometry constraints
poses = [(1.0, 0, 0), (1.0, 0, np.pi/2), (1.0, 0, 0), (1.0, 0, np.pi/2)]
x, y, theta = 0, 0, 0
for i, (dx, dy, dtheta) in enumerate(poses):
theta += dtheta
x += dx * np.cos(theta)
y += dx * np.sin(theta)
graph.add(gtsam.BetweenFactorPose2(
i, i + 1, gtsam.Pose2(dx, dy, dtheta), odom_noise
))
initial.insert(i + 1, gtsam.Pose2(x, y, theta))
# Loop closure: robot returned near start
loop_noise = gtsam.noiseModel.Diagonal.Sigmas(np.array([0.1, 0.1, 0.05]))
graph.add(gtsam.BetweenFactorPose2(
4, 0, gtsam.Pose2(0, 0, 0), loop_noise
))
# Optimize
params = gtsam.LevenbergMarquardtParams()
optimizer = gtsam.LevenbergMarquardtOptimizer(graph, initial, params)
result = optimizer.optimize()
return result
The optimizer adjusts all poses simultaneously to minimize the total error across all constraints. Loop closures propagate corrections backward through the entire trajectory.
Occupancy Grid Mapping
For navigation, robots need to know which areas are free and which are blocked. Occupancy grids divide the environment into cells, each with a log-odds probability of being occupied:
class OccupancyGrid:
def __init__(self, width, height, resolution=0.05):
self.resolution = resolution
cols = int(width / resolution)
rows = int(height / resolution)
self.grid = np.zeros((rows, cols)) # log-odds
self.origin = np.array([-width/2, -height/2])
def world_to_grid(self, x, y):
col = int((x - self.origin[0]) / self.resolution)
row = int((y - self.origin[1]) / self.resolution)
return row, col
def update_ray(self, robot_x, robot_y, hit_x, hit_y, hit=True):
"""Update cells along a laser ray using Bresenham's algorithm."""
r0, c0 = self.world_to_grid(robot_x, robot_y)
r1, c1 = self.world_to_grid(hit_x, hit_y)
cells = bresenham(r0, c0, r1, c1)
# Free cells along the ray
for r, c in cells[:-1]:
if 0 <= r < self.grid.shape[0] and 0 <= c < self.grid.shape[1]:
self.grid[r, c] -= 0.4 # decrease occupancy belief
# Hit cell at the end
if hit:
r, c = cells[-1]
if 0 <= r < self.grid.shape[0] and 0 <= c < self.grid.shape[1]:
self.grid[r, c] += 0.85 # increase occupancy belief
def probability_map(self):
"""Convert log-odds to probabilities."""
return 1 - 1 / (1 + np.exp(self.grid))
def bresenham(r0, c0, r1, c1):
"""Integer line rasterization."""
cells = []
dr = abs(r1 - r0)
dc = abs(c1 - c0)
sr = 1 if r1 > r0 else -1
sc = 1 if c1 > c0 else -1
err = dc - dr
r, c = r0, c0
while True:
cells.append((r, c))
if r == r1 and c == c1:
break
e2 = 2 * err
if e2 > -dr:
err -= dr
c += sc
if e2 < dc:
err += dc
r += sr
return cells
Loop Closure Detection
Detecting revisited locations is the hardest part of SLAM. Common approaches:
Scan Matching
Compare the current LiDAR scan against stored scans using ICP or correlative scan matching. Works well in structured environments but expensive for large map histories.
Visual Bag of Words (DBoW2)
For camera-based SLAM, extract visual features (ORB, SIFT) from images, quantize them into a visual vocabulary, and compare histograms. ORB-SLAM2/3 uses this approach. Python bindings exist for DBoW2, or you can use OpenCV’s BOWKMeansTrainer:
import cv2
# Build vocabulary from training images
detector = cv2.ORB_create(nfeatures=1000)
bow_trainer = cv2.BOWKMeansTrainer(clusterCount=200)
for image in training_images:
keypoints, descriptors = detector.detectAndCompute(image, None)
if descriptors is not None:
bow_trainer.add(descriptors.astype(np.float32))
vocabulary = bow_trainer.cluster()
Place Recognition with Neural Networks
NetVLAD and its successors learn compact image descriptors. A current image’s descriptor is compared against a database of previous descriptors. Cosine similarity above a threshold triggers a loop closure candidate, which is then verified with geometric checks.
Visual-Inertial SLAM
Combining cameras with IMUs (Inertial Measurement Units) produces more robust SLAM than either sensor alone. The IMU provides high-frequency rotation and acceleration estimates between camera frames, while the camera corrects IMU drift.
Python prototyping of visual-inertial odometry uses:
- OpenCV for feature tracking (KLT optical flow or feature matching)
- NumPy/SciPy for IMU integration and optimization
- GTSAM for factor graph optimization combining visual and inertial factors
Production systems (VINS-Mono, OKVIS, Kimera) are C++ but use Python for evaluation, plotting, and parameter tuning.
Performance and Scaling
| SLAM Approach | Map Size | Update Frequency | Accuracy |
|---|---|---|---|
| EKF-SLAM | < 200 landmarks | 10–30 Hz | Moderate |
| FastSLAM (100 particles) | < 5000 landmarks | 5–20 Hz | Good |
| Graph-SLAM (GTSAM) | Millions of poses | Batch / 1–5 Hz | Excellent |
| ORB-SLAM3 | Building-scale | 30 Hz (C++) | Excellent |
Python implementations are typically 10–50x slower than C++ equivalents. The practical pattern: prototype in Python, validate correctness, then port performance-critical components to C++ or use libraries with C++ backends (GTSAM, g2o).
One thing to remember: Modern SLAM systems build a pose graph from odometry and loop closure detections, then use nonlinear optimization (via libraries like GTSAM) to jointly adjust all poses — producing globally consistent maps even when individual measurements are noisy.
See Also
- Python Behavior Trees Robotics How robots make decisions using a tree-shaped rulebook that keeps them organized, like a flowchart that tells a robot what to do in every situation.
- Python Bluetooth Ble How Python connects to fitness trackers, smart locks, and wireless sensors using the invisible radio signals all around you.
- Python Circuitpython Hardware Why CircuitPython makes wiring up LEDs, sensors, and motors as easy as plugging in a USB drive.
- Python Computer Vision Autonomous How self-driving cars use cameras and Python to see the road, spot pedestrians, read signs, and understand traffic — like giving a car human eyes and a brain.
- Python Home Assistant Automation How Python turns your home into a smart home that reacts to you automatically, like a helpful invisible butler.