Linked Lists — Deep Dive

Technical framing

Linked lists are the workhorse behind many higher-level data structures. Python’s collections.OrderedDict uses a doubly linked list internally. CPython’s memory allocator uses free lists. The lru_cache decorator relies on a doubly linked list for O(1) eviction. Understanding linked list algorithms at a deep level gives you the tools to build and optimize these structures.

This deep dive covers classic linked list algorithms, their complexity proofs, and production Python implementations.

Complete linked list implementation with type hints

from __future__ import annotations
from typing import TypeVar, Generic, Iterator, Optional

T = TypeVar('T')

class Node(Generic[T]):
    __slots__ = ('data', 'prev', 'next')
    
    def __init__(self, data: T, prev: Optional[Node[T]] = None, 
                 next_: Optional[Node[T]] = None):
        self.data = data
        self.prev = prev
        self.next = next_

class DoublyLinkedList(Generic[T]):
    """Doubly linked list with sentinel nodes for clean edge-case handling."""
    
    def __init__(self):
        self._head = Node(None)  # type: ignore
        self._tail = Node(None)  # type: ignore
        self._head.next = self._tail
        self._tail.prev = self._head
        self._size = 0
    
    def __len__(self) -> int:
        return self._size
    
    def __bool__(self) -> bool:
        return self._size > 0
    
    def __iter__(self) -> Iterator[T]:
        current = self._head.next
        while current is not self._tail:
            yield current.data
            current = current.next
    
    def __reversed__(self) -> Iterator[T]:
        current = self._tail.prev
        while current is not self._head:
            yield current.data
            current = current.prev
    
    def _insert_between(self, data: T, predecessor: Node[T], 
                        successor: Node[T]) -> Node[T]:
        new_node = Node(data, prev=predecessor, next_=successor)
        predecessor.next = new_node
        successor.prev = new_node
        self._size += 1
        return new_node
    
    def _remove_node(self, node: Node[T]) -> T:
        predecessor = node.prev
        successor = node.next
        predecessor.next = successor
        successor.prev = predecessor
        self._size -= 1
        data = node.data
        node.prev = node.next = None  # help GC
        return data
    
    def push_front(self, data: T) -> Node[T]:
        return self._insert_between(data, self._head, self._head.next)
    
    def push_back(self, data: T) -> Node[T]:
        return self._insert_between(data, self._tail.prev, self._tail)
    
    def pop_front(self) -> T:
        if not self:
            raise IndexError("pop from empty list")
        return self._remove_node(self._head.next)
    
    def pop_back(self) -> T:
        if not self:
            raise IndexError("pop from empty list")
        return self._remove_node(self._tail.prev)
    
    def move_to_front(self, node: Node[T]) -> None:
        """Move an existing node to the front in O(1). Used in LRU caches."""
        self._remove_node(node)
        self._size += 1  # _remove_node decremented, but we are re-inserting
        node.prev = self._head
        node.next = self._head.next
        self._head.next.prev = node
        self._head.next = node

Using __slots__ on the Node class reduces memory usage by ~40% compared to a regular class with __dict__.

LRU cache with linked list + hash map

The classic O(1) LRU cache combines a doubly linked list (for ordering) with a hash map (for O(1) lookups):

class LRUCache:
    def __init__(self, capacity: int):
        self.capacity = capacity
        self.cache: dict[any, Node] = {}
        self.list = DoublyLinkedList()
    
    def get(self, key):
        if key not in self.cache:
            return None
        node = self.cache[key]
        self.list.move_to_front(node)
        return node.data[1]  # (key, value) stored in node
    
    def put(self, key, value):
        if key in self.cache:
            node = self.cache[key]
            node.data = (key, value)
            self.list.move_to_front(node)
            return
        
        if len(self.list) >= self.capacity:
            # Evict least recently used (tail)
            evicted_data = self.list.pop_back()
            del self.cache[evicted_data[0]]
        
        node = self.list.push_front((key, value))
        self.cache[key] = node

This is exactly the pattern used by functools.lru_cache and collections.OrderedDict internally (though implemented in C for speed).

Classic linked list algorithms

Reversing a linked list (iterative)

def reverse_singly(head: Node) -> Node:
    prev = None
    current = head
    while current:
        next_node = current.next
        current.next = prev
        prev = current
        current = next_node
    return prev  # new head

Three pointers, one pass, O(1) space. This is one of the most commonly asked interview questions.

Cycle detection: Floyd’s algorithm

Detect whether a linked list contains a cycle using two pointers moving at different speeds:

def has_cycle(head: Node) -> bool:
    slow = fast = head
    while fast and fast.next:
        slow = slow.next
        fast = fast.next.next
        if slow is fast:
            return True
    return False

def find_cycle_start(head: Node) -> Optional[Node]:
    slow = fast = head
    while fast and fast.next:
        slow = slow.next
        fast = fast.next.next
        if slow is fast:
            # Reset one pointer to head
            slow = head
            while slow is not fast:
                slow = slow.next
                fast = fast.next
            return slow
    return None

Why this works: When slow and fast meet inside the cycle, the distance from the head to the cycle start equals the distance from the meeting point to the cycle start (moving forward in the cycle). This is provable using modular arithmetic on the cycle length.

Finding the middle node

def find_middle(head: Node) -> Node:
    slow = fast = head
    while fast and fast.next:
        slow = slow.next
        fast = fast.next.next
    return slow

When fast reaches the end, slow is at the middle. This is used as a building block for merge sort on linked lists.

Merge sort on linked lists

Linked lists are naturally suited to merge sort because merging two sorted lists requires no extra space (unlike arrays):

def merge_sort_list(head: Node) -> Node:
    if not head or not head.next:
        return head
    
    # Split at middle
    mid = find_middle_for_split(head)
    right_head = mid.next
    mid.next = None
    
    left = merge_sort_list(head)
    right = merge_sort_list(right_head)
    return merge_two_lists(left, right)

def find_middle_for_split(head: Node) -> Node:
    slow = head
    fast = head.next  # offset by one to split evenly
    while fast and fast.next:
        slow = slow.next
        fast = fast.next.next
    return slow

def merge_two_lists(l1: Node, l2: Node) -> Node:
    dummy = Node(0)
    current = dummy
    while l1 and l2:
        if l1.data <= l2.data:
            current.next = l1
            l1 = l1.next
        else:
            current.next = l2
            l2 = l2.next
        current = current.next
    current.next = l1 or l2
    return dummy.next

This achieves O(n log n) time with O(log n) space (recursion stack only) — better than array merge sort which needs O(n) auxiliary space.

Memory and performance analysis

Memory overhead per node

import sys

class NodeSlots:
    __slots__ = ('data', 'next')
    def __init__(self, data, next_=None):
        self.data = data
        self.next = next_

class NodeDict:
    def __init__(self, data, next_=None):
        self.data = data
        self.next = next_

# Memory comparison (64-bit Python, CPython 3.11):
# NodeSlots instance: ~56 bytes
# NodeDict instance:  ~152 bytes (includes __dict__)
# Python list element: ~8 bytes (pointer only, data stored separately)

A linked list of 1 million integers uses ~56 MB with __slots__ vs ~8 MB for a Python list. This 7x overhead is the main reason Python lists are preferred for general use.

Cache performance

Modern CPUs are optimized for sequential memory access. Python lists store pointers contiguously, giving decent cache locality. Linked list nodes are scattered across the heap, causing frequent cache misses. For iteration-heavy workloads, this can make linked lists 5-10x slower than arrays even when the algorithmic complexity is identical.

collections.deque: Python’s built-in linked structure

collections.deque is implemented as a doubly linked list of fixed-size blocks (each block holds 64 items):

from collections import deque

d = deque([1, 2, 3])
d.appendleft(0)   # O(1)
d.append(4)        # O(1)
d.popleft()        # O(1)
d.rotate(2)        # O(k) where k is rotation amount

The block-based design combines linked list flexibility (O(1) ends) with array-like cache performance within each block. For most Python applications, deque is the right choice over a custom linked list.

XOR linked list (memory-efficient doubly linked)

A theoretical optimization where each node stores prev XOR next instead of two separate pointers. This halves pointer storage but requires careful pointer arithmetic:

# Conceptual only — not practical in Python due to GC and id() instability
# In C, this saves 8 bytes per node (one pointer)
# In Python, the overhead of id() manipulation exceeds the savings

XOR linked lists are a C/C++ technique. In Python, the interpreter overhead makes this impractical, but it is worth knowing as a computer science concept.

Real-world applications in Python codebases

  1. collections.OrderedDict — doubly linked list of entries for O(1) move-to-end
  2. functools.lru_cache — doubly linked list for eviction ordering
  3. CPython free lists — singly linked lists of pre-allocated objects (ints, floats, tuples) for fast allocation
  4. asyncio task scheduling — linked structures for callback chains
  5. Blockchain — each block points to the previous block’s hash (conceptually a singly linked list)

One thing to remember: In Python, you rarely need a custom linked list — list and deque cover most cases. But linked lists are the backbone of caches, allocators, and ordered dictionaries. Master the algorithms (reverse, detect cycle, merge sort) because they train the pointer-manipulation thinking that transfers to trees, graphs, and systems programming.

pythondata-structureslinked-lists

See Also