Python SLAM Algorithms — Deep Dive

Implement EKF-SLAM and pose graph optimization in Python with NumPy, build occupancy grids from laser scans, and integrate GTSAM for production-grade mapping.

EKF-SLAM Implementation

The Extended Kalman Filter maintains a state vector and covariance matrix that jointly represent the robot pose and all landmark positions.

State Representation

For a 2D robot with N landmarks:

import numpy as np

# State: [x, y, theta, lm1_x, lm1_y, lm2_x, lm2_y, ...]
# Dimension: 3 + 2*N

def init_state(n_landmarks):
    state = np.zeros(3 + 2 * n_landmarks)
    covariance = np.eye(3 + 2 * n_landmarks) * 1000  # high initial uncertainty
    covariance[:3, :3] = np.zeros((3, 3))  # robot starts at known origin
    return state, covariance

Motion Update (Prediction)

When the robot moves, we predict the new state using the motion model and propagate uncertainty:

def predict(state, cov, velocity, angular_vel, dt):
    theta = state[2]
    n = len(state)

    # Motion model (velocity model)
    if abs(angular_vel) > 1e-6:
        r = velocity / angular_vel
        dx = -r * np.sin(theta) + r * np.sin(theta + angular_vel * dt)
        dy = r * np.cos(theta) - r * np.cos(theta + angular_vel * dt)
        dtheta = angular_vel * dt
    else:
        dx = velocity * np.cos(theta) * dt
        dy = velocity * np.sin(theta) * dt
        dtheta = 0

    state[0] += dx
    state[1] += dy
    state[2] += dtheta

    # Jacobian of motion model w.r.t. state
    F = np.eye(n)
    if abs(angular_vel) > 1e-6:
        r = velocity / angular_vel
        F[0, 2] = -r * np.cos(theta) + r * np.cos(theta + angular_vel * dt)
        F[1, 2] = -r * np.sin(theta) + r * np.sin(theta + angular_vel * dt)

    # Process noise (only affects robot pose)
    Q = np.zeros((n, n))
    Q[:3, :3] = np.diag([0.1, 0.1, 0.01]) * dt

    cov = F @ cov @ F.T + Q
    return state, cov

Measurement Update

When the robot observes a landmark (range and bearing measurement):

def update(state, cov, landmark_id, measured_range, measured_bearing):
    lm_idx = 3 + 2 * landmark_id
    lm_x, lm_y = state[lm_idx], state[lm_idx + 1]
    rx, ry, rtheta = state[0], state[1], state[2]

    dx = lm_x - rx
    dy = lm_y - ry
    q = dx**2 + dy**2
    sqrt_q = np.sqrt(q)

    # Expected measurement
    expected_range = sqrt_q
    expected_bearing = np.arctan2(dy, dx) - rtheta

    # Innovation (difference between measured and expected)
    innovation = np.array([
        measured_range - expected_range,
        normalize_angle(measured_bearing - expected_bearing)
    ])

    # Measurement Jacobian
    n = len(state)
    H = np.zeros((2, n))
    H[0, 0] = -dx / sqrt_q
    H[0, 1] = -dy / sqrt_q
    H[0, lm_idx] = dx / sqrt_q
    H[0, lm_idx + 1] = dy / sqrt_q
    H[1, 0] = dy / q
    H[1, 1] = -dx / q
    H[1, 2] = -1
    H[1, lm_idx] = -dy / q
    H[1, lm_idx + 1] = dx / q

    # Measurement noise
    R = np.diag([0.1, 0.05])  # range, bearing variances

    # Kalman gain
    S = H @ cov @ H.T + R
    K = cov @ H.T @ np.linalg.inv(S)

    state = state + K @ innovation
    cov = (np.eye(n) - K @ H) @ cov

    return state, cov

def normalize_angle(angle):
    while angle > np.pi: angle -= 2 * np.pi
    while angle < -np.pi: angle += 2 * np.pi
    return angle

EKF-SLAM Limitations

The covariance matrix is (3 + 2N) × (3 + 2N). With 1000 landmarks, that is a 2003×2003 matrix updated at every observation. The O(N²) update cost makes EKF-SLAM impractical beyond small environments.

Pose Graph SLAM with GTSAM

Graph-based SLAM is the modern standard. Each robot pose is a node, and constraints (from odometry and loop closures) are edges. GTSAM (Georgia Tech Smoothing And Mapping) provides efficient implementations:

import gtsam

def build_pose_graph():
    graph = gtsam.NonlinearFactorGraph()
    initial = gtsam.Values()

    # Prior on first pose
    prior_noise = gtsam.noiseModel.Diagonal.Sigmas(np.array([0.1, 0.1, 0.05]))
    graph.add(gtsam.PriorFactorPose2(0, gtsam.Pose2(0, 0, 0), prior_noise))
    initial.insert(0, gtsam.Pose2(0, 0, 0))

    # Odometry noise model
    odom_noise = gtsam.noiseModel.Diagonal.Sigmas(np.array([0.2, 0.2, 0.1]))

    # Add odometry constraints
    poses = [(1.0, 0, 0), (1.0, 0, np.pi/2), (1.0, 0, 0), (1.0, 0, np.pi/2)]
    x, y, theta = 0, 0, 0

    for i, (dx, dy, dtheta) in enumerate(poses):
        theta += dtheta
        x += dx * np.cos(theta)
        y += dx * np.sin(theta)

        graph.add(gtsam.BetweenFactorPose2(
            i, i + 1, gtsam.Pose2(dx, dy, dtheta), odom_noise
        ))
        initial.insert(i + 1, gtsam.Pose2(x, y, theta))

    # Loop closure: robot returned near start
    loop_noise = gtsam.noiseModel.Diagonal.Sigmas(np.array([0.1, 0.1, 0.05]))
    graph.add(gtsam.BetweenFactorPose2(
        4, 0, gtsam.Pose2(0, 0, 0), loop_noise
    ))

    # Optimize
    params = gtsam.LevenbergMarquardtParams()
    optimizer = gtsam.LevenbergMarquardtOptimizer(graph, initial, params)
    result = optimizer.optimize()

    return result

The optimizer adjusts all poses simultaneously to minimize the total error across all constraints. Loop closures propagate corrections backward through the entire trajectory.

Occupancy Grid Mapping

For navigation, robots need to know which areas are free and which are blocked. Occupancy grids divide the environment into cells, each with a log-odds probability of being occupied:

class OccupancyGrid:
    def __init__(self, width, height, resolution=0.05):
        self.resolution = resolution
        cols = int(width / resolution)
        rows = int(height / resolution)
        self.grid = np.zeros((rows, cols))  # log-odds
        self.origin = np.array([-width/2, -height/2])

    def world_to_grid(self, x, y):
        col = int((x - self.origin[0]) / self.resolution)
        row = int((y - self.origin[1]) / self.resolution)
        return row, col

    def update_ray(self, robot_x, robot_y, hit_x, hit_y, hit=True):
        """Update cells along a laser ray using Bresenham's algorithm."""
        r0, c0 = self.world_to_grid(robot_x, robot_y)
        r1, c1 = self.world_to_grid(hit_x, hit_y)

        cells = bresenham(r0, c0, r1, c1)

        # Free cells along the ray
        for r, c in cells[:-1]:
            if 0 <= r < self.grid.shape[0] and 0 <= c < self.grid.shape[1]:
                self.grid[r, c] -= 0.4  # decrease occupancy belief

        # Hit cell at the end
        if hit:
            r, c = cells[-1]
            if 0 <= r < self.grid.shape[0] and 0 <= c < self.grid.shape[1]:
                self.grid[r, c] += 0.85  # increase occupancy belief

    def probability_map(self):
        """Convert log-odds to probabilities."""
        return 1 - 1 / (1 + np.exp(self.grid))


def bresenham(r0, c0, r1, c1):
    """Integer line rasterization."""
    cells = []
    dr = abs(r1 - r0)
    dc = abs(c1 - c0)
    sr = 1 if r1 > r0 else -1
    sc = 1 if c1 > c0 else -1
    err = dc - dr
    r, c = r0, c0

    while True:
        cells.append((r, c))
        if r == r1 and c == c1:
            break
        e2 = 2 * err
        if e2 > -dr:
            err -= dr
            c += sc
        if e2 < dc:
            err += dc
            r += sr

    return cells

Loop Closure Detection

Detecting revisited locations is the hardest part of SLAM. Common approaches:

Scan Matching

Compare the current LiDAR scan against stored scans using ICP or correlative scan matching. Works well in structured environments but expensive for large map histories.

Visual Bag of Words (DBoW2)

For camera-based SLAM, extract visual features (ORB, SIFT) from images, quantize them into a visual vocabulary, and compare histograms. ORB-SLAM2/3 uses this approach. Python bindings exist for DBoW2, or you can use OpenCV’s BOWKMeansTrainer:

import cv2

# Build vocabulary from training images
detector = cv2.ORB_create(nfeatures=1000)
bow_trainer = cv2.BOWKMeansTrainer(clusterCount=200)

for image in training_images:
    keypoints, descriptors = detector.detectAndCompute(image, None)
    if descriptors is not None:
        bow_trainer.add(descriptors.astype(np.float32))

vocabulary = bow_trainer.cluster()

Place Recognition with Neural Networks

NetVLAD and its successors learn compact image descriptors. A current image’s descriptor is compared against a database of previous descriptors. Cosine similarity above a threshold triggers a loop closure candidate, which is then verified with geometric checks.

Visual-Inertial SLAM

Combining cameras with IMUs (Inertial Measurement Units) produces more robust SLAM than either sensor alone. The IMU provides high-frequency rotation and acceleration estimates between camera frames, while the camera corrects IMU drift.

Python prototyping of visual-inertial odometry uses:

OpenCV for feature tracking (KLT optical flow or feature matching)
NumPy/SciPy for IMU integration and optimization
GTSAM for factor graph optimization combining visual and inertial factors

Production systems (VINS-Mono, OKVIS, Kimera) are C++ but use Python for evaluation, plotting, and parameter tuning.

Performance and Scaling

SLAM Approach	Map Size	Update Frequency	Accuracy
EKF-SLAM	< 200 landmarks	10–30 Hz	Moderate
FastSLAM (100 particles)	< 5000 landmarks	5–20 Hz	Good
Graph-SLAM (GTSAM)	Millions of poses	Batch / 1–5 Hz	Excellent
ORB-SLAM3	Building-scale	30 Hz (C++)	Excellent

Python implementations are typically 10–50x slower than C++ equivalents. The practical pattern: prototype in Python, validate correctness, then port performance-critical components to C++ or use libraries with C++ backends (GTSAM, g2o).

One thing to remember: Modern SLAM systems build a pose graph from odometry and loop closure detections, then use nonlinear optimization (via libraries like GTSAM) to jointly adjust all poses — producing globally consistent maps even when individual measurements are noisy.

pythonslamroboticsmappingautonomous-vehicles