Linked Lists — Deep Dive
Technical framing
Linked lists are the workhorse behind many higher-level data structures. Python’s collections.OrderedDict uses a doubly linked list internally. CPython’s memory allocator uses free lists. The lru_cache decorator relies on a doubly linked list for O(1) eviction. Understanding linked list algorithms at a deep level gives you the tools to build and optimize these structures.
This deep dive covers classic linked list algorithms, their complexity proofs, and production Python implementations.
Complete linked list implementation with type hints
from __future__ import annotations
from typing import TypeVar, Generic, Iterator, Optional
T = TypeVar('T')
class Node(Generic[T]):
__slots__ = ('data', 'prev', 'next')
def __init__(self, data: T, prev: Optional[Node[T]] = None,
next_: Optional[Node[T]] = None):
self.data = data
self.prev = prev
self.next = next_
class DoublyLinkedList(Generic[T]):
"""Doubly linked list with sentinel nodes for clean edge-case handling."""
def __init__(self):
self._head = Node(None) # type: ignore
self._tail = Node(None) # type: ignore
self._head.next = self._tail
self._tail.prev = self._head
self._size = 0
def __len__(self) -> int:
return self._size
def __bool__(self) -> bool:
return self._size > 0
def __iter__(self) -> Iterator[T]:
current = self._head.next
while current is not self._tail:
yield current.data
current = current.next
def __reversed__(self) -> Iterator[T]:
current = self._tail.prev
while current is not self._head:
yield current.data
current = current.prev
def _insert_between(self, data: T, predecessor: Node[T],
successor: Node[T]) -> Node[T]:
new_node = Node(data, prev=predecessor, next_=successor)
predecessor.next = new_node
successor.prev = new_node
self._size += 1
return new_node
def _remove_node(self, node: Node[T]) -> T:
predecessor = node.prev
successor = node.next
predecessor.next = successor
successor.prev = predecessor
self._size -= 1
data = node.data
node.prev = node.next = None # help GC
return data
def push_front(self, data: T) -> Node[T]:
return self._insert_between(data, self._head, self._head.next)
def push_back(self, data: T) -> Node[T]:
return self._insert_between(data, self._tail.prev, self._tail)
def pop_front(self) -> T:
if not self:
raise IndexError("pop from empty list")
return self._remove_node(self._head.next)
def pop_back(self) -> T:
if not self:
raise IndexError("pop from empty list")
return self._remove_node(self._tail.prev)
def move_to_front(self, node: Node[T]) -> None:
"""Move an existing node to the front in O(1). Used in LRU caches."""
self._remove_node(node)
self._size += 1 # _remove_node decremented, but we are re-inserting
node.prev = self._head
node.next = self._head.next
self._head.next.prev = node
self._head.next = node
Using __slots__ on the Node class reduces memory usage by ~40% compared to a regular class with __dict__.
LRU cache with linked list + hash map
The classic O(1) LRU cache combines a doubly linked list (for ordering) with a hash map (for O(1) lookups):
class LRUCache:
def __init__(self, capacity: int):
self.capacity = capacity
self.cache: dict[any, Node] = {}
self.list = DoublyLinkedList()
def get(self, key):
if key not in self.cache:
return None
node = self.cache[key]
self.list.move_to_front(node)
return node.data[1] # (key, value) stored in node
def put(self, key, value):
if key in self.cache:
node = self.cache[key]
node.data = (key, value)
self.list.move_to_front(node)
return
if len(self.list) >= self.capacity:
# Evict least recently used (tail)
evicted_data = self.list.pop_back()
del self.cache[evicted_data[0]]
node = self.list.push_front((key, value))
self.cache[key] = node
This is exactly the pattern used by functools.lru_cache and collections.OrderedDict internally (though implemented in C for speed).
Classic linked list algorithms
Reversing a linked list (iterative)
def reverse_singly(head: Node) -> Node:
prev = None
current = head
while current:
next_node = current.next
current.next = prev
prev = current
current = next_node
return prev # new head
Three pointers, one pass, O(1) space. This is one of the most commonly asked interview questions.
Cycle detection: Floyd’s algorithm
Detect whether a linked list contains a cycle using two pointers moving at different speeds:
def has_cycle(head: Node) -> bool:
slow = fast = head
while fast and fast.next:
slow = slow.next
fast = fast.next.next
if slow is fast:
return True
return False
def find_cycle_start(head: Node) -> Optional[Node]:
slow = fast = head
while fast and fast.next:
slow = slow.next
fast = fast.next.next
if slow is fast:
# Reset one pointer to head
slow = head
while slow is not fast:
slow = slow.next
fast = fast.next
return slow
return None
Why this works: When slow and fast meet inside the cycle, the distance from the head to the cycle start equals the distance from the meeting point to the cycle start (moving forward in the cycle). This is provable using modular arithmetic on the cycle length.
Finding the middle node
def find_middle(head: Node) -> Node:
slow = fast = head
while fast and fast.next:
slow = slow.next
fast = fast.next.next
return slow
When fast reaches the end, slow is at the middle. This is used as a building block for merge sort on linked lists.
Merge sort on linked lists
Linked lists are naturally suited to merge sort because merging two sorted lists requires no extra space (unlike arrays):
def merge_sort_list(head: Node) -> Node:
if not head or not head.next:
return head
# Split at middle
mid = find_middle_for_split(head)
right_head = mid.next
mid.next = None
left = merge_sort_list(head)
right = merge_sort_list(right_head)
return merge_two_lists(left, right)
def find_middle_for_split(head: Node) -> Node:
slow = head
fast = head.next # offset by one to split evenly
while fast and fast.next:
slow = slow.next
fast = fast.next.next
return slow
def merge_two_lists(l1: Node, l2: Node) -> Node:
dummy = Node(0)
current = dummy
while l1 and l2:
if l1.data <= l2.data:
current.next = l1
l1 = l1.next
else:
current.next = l2
l2 = l2.next
current = current.next
current.next = l1 or l2
return dummy.next
This achieves O(n log n) time with O(log n) space (recursion stack only) — better than array merge sort which needs O(n) auxiliary space.
Memory and performance analysis
Memory overhead per node
import sys
class NodeSlots:
__slots__ = ('data', 'next')
def __init__(self, data, next_=None):
self.data = data
self.next = next_
class NodeDict:
def __init__(self, data, next_=None):
self.data = data
self.next = next_
# Memory comparison (64-bit Python, CPython 3.11):
# NodeSlots instance: ~56 bytes
# NodeDict instance: ~152 bytes (includes __dict__)
# Python list element: ~8 bytes (pointer only, data stored separately)
A linked list of 1 million integers uses ~56 MB with __slots__ vs ~8 MB for a Python list. This 7x overhead is the main reason Python lists are preferred for general use.
Cache performance
Modern CPUs are optimized for sequential memory access. Python lists store pointers contiguously, giving decent cache locality. Linked list nodes are scattered across the heap, causing frequent cache misses. For iteration-heavy workloads, this can make linked lists 5-10x slower than arrays even when the algorithmic complexity is identical.
collections.deque: Python’s built-in linked structure
collections.deque is implemented as a doubly linked list of fixed-size blocks (each block holds 64 items):
from collections import deque
d = deque([1, 2, 3])
d.appendleft(0) # O(1)
d.append(4) # O(1)
d.popleft() # O(1)
d.rotate(2) # O(k) where k is rotation amount
The block-based design combines linked list flexibility (O(1) ends) with array-like cache performance within each block. For most Python applications, deque is the right choice over a custom linked list.
XOR linked list (memory-efficient doubly linked)
A theoretical optimization where each node stores prev XOR next instead of two separate pointers. This halves pointer storage but requires careful pointer arithmetic:
# Conceptual only — not practical in Python due to GC and id() instability
# In C, this saves 8 bytes per node (one pointer)
# In Python, the overhead of id() manipulation exceeds the savings
XOR linked lists are a C/C++ technique. In Python, the interpreter overhead makes this impractical, but it is worth knowing as a computer science concept.
Real-world applications in Python codebases
collections.OrderedDict— doubly linked list of entries for O(1) move-to-endfunctools.lru_cache— doubly linked list for eviction ordering- CPython free lists — singly linked lists of pre-allocated objects (ints, floats, tuples) for fast allocation
asynciotask scheduling — linked structures for callback chains- Blockchain — each block points to the previous block’s hash (conceptually a singly linked list)
One thing to remember: In Python, you rarely need a custom linked list — list and deque cover most cases. But linked lists are the backbone of caches, allocators, and ordered dictionaries. Master the algorithms (reverse, detect cycle, merge sort) because they train the pointer-manipulation thinking that transfers to trees, graphs, and systems programming.
See Also
- Python Backtracking Algorithms How computers solve puzzles by trying every path, backing up when stuck, and never giving up until they find the answer.
- Python Big O Complexity Analysis Why some programs finish in a blink while others take forever — explained with pizza delivery and toy cleanup.
- Python Binary Search Implementation Find anything in a sorted list insanely fast using the same trick you already use with dictionaries and phone books.
- Python Dynamic Programming The clever trick of remembering answers you already figured out so you never solve the same puzzle twice.
- Python Graph Algorithms How computers navigate maps, friendships, and connections using dots and lines — explained without any math.