Python bisect & Sorted Lists — Deep Dive

Internals of Python's bisect module, advanced patterns for range queries and event scheduling, plus performance analysis against alternatives.

How bisect works internally

The CPython implementation of bisect_left is a textbook binary search:

def bisect_left(a, x, lo=0, hi=None, *, key=None):
    if hi is None:
        hi = len(a)
    if key is None:
        while lo < hi:
            mid = (lo + hi) // 2
            if a[mid] < x:
                lo = mid + 1
            else:
                hi = mid
    else:
        while lo < hi:
            mid = (lo + hi) // 2
            if key(a[mid]) < x:
                lo = mid + 1
            else:
                hi = mid
    return lo

bisect_right differs only in the comparison: a[mid] <= x instead of a[mid] < x. This single change determines whether equal elements land to the left or right.

Since CPython 3.10, the fast path (no key) is implemented in C (_bisectmodule.c), making it significantly faster than a pure Python loop. The key path falls back to Python-level calls per comparison.

The `lo` and `hi` parameters

Both bisect_left and bisect_right accept lo and hi to restrict the search range. This is useful for searching within a window of a larger sorted list:

import bisect

data = [10, 20, 30, 40, 50, 60, 70, 80]
# Search only within indices 2..5
pos = bisect.bisect_left(data, 45, lo=2, hi=6)
# pos → 4

This avoids copying a sublist and keeps the operation allocation-free.

Advanced pattern: range queries

To find all elements in a sorted list within [low, high]:

import bisect

def range_query(sorted_list, low, high):
    left = bisect.bisect_left(sorted_list, low)
    right = bisect.bisect_right(sorted_list, high)
    return sorted_list[left:right]

timestamps = [100, 200, 300, 400, 500, 600]
result = range_query(timestamps, 250, 500)
# → [300, 400, 500]

Both lookups are O(log n), and the slice is O(k) where k is the number of results. This is the foundation of index-based range scans in databases.

Advanced pattern: discrete event simulation

Games and simulations schedule events at specific times. A sorted event list with insort provides a simple priority mechanism:

import bisect
from dataclasses import dataclass, field

@dataclass(order=True)
class Event:
    time: float
    sequence: int = field(compare=True)
    callback: object = field(compare=False)

class EventLoop:
    def __init__(self):
        self.events = []
        self._seq = 0

    def schedule(self, time, callback):
        self._seq += 1
        bisect.insort(self.events, Event(time, self._seq, callback))

    def run(self, until):
        while self.events and self.events[0].time <= until:
            event = self.events.pop(0)
            event.callback()

The sequence number breaks ties for events at the same time, preserving insertion order. For high-throughput simulations, replacing the list with heapq avoids the O(n) insertion cost of insort — but insort wins when you also need random access to future events (e.g., for cancellation by index).

Performance analysis

bisect_left vs linear scan

For a sorted list of n elements:

Method	Time complexity	1M elements
`bisect_left`	O(log n)	~20 comparisons
`list.index()`	O(n)	up to 1,000,000 comparisons
`x in list`	O(n)	up to 1,000,000 comparisons
`x in set`	O(1) average	1 hash + compare

If you only need membership testing, a set is faster. bisect shines when you need the position, not just existence.

insort vs sort-after-append

Inserting 10,000 items one at a time into a list of 100,000 items (CPython 3.12, benchmarked with timeit):

Strategy	Time
`insort` per item	~120 ms
`append` + `sort()` after each	~8,500 ms
`append` all + single `sort()`	~25 ms

If you’re batch-inserting, sort once at the end. If you’re inserting one at a time and need the list sorted between insertions, insort is the right choice.

The list-shift bottleneck

insort finds the position in O(log n) but the actual insertion into a Python list is O(n) because elements after the insertion point must be shifted. For workloads dominated by insertions, this makes insort slower than a balanced tree structure.

The sortedcontainers.SortedList provides O(log n) insertion using a tiered list-of-lists structure. Benchmarks typically show SortedList.add() is 3-10x faster than insort for lists above 100,000 elements, while maintaining similar iteration speed.

Implementing custom binary searches with bisect

The bisect module can solve problems beyond simple sorted-list operations. Any monotonic function can be binary-searched by creating a virtual “sorted list”:

import bisect

class MonotonicFunction:
    """Wraps a monotonic function so bisect can search its domain."""
    def __init__(self, func, size):
        self.func = func
        self.size = size

    def __len__(self):
        return self.size

    def __getitem__(self, i):
        return self.func(i)

# Find the first integer where x**2 >= 1000
squares = MonotonicFunction(lambda x: x * x, 1000)
result = bisect.bisect_left(squares, 1000)
# result → 32 (32**2 = 1024 >= 1000)

This trick works because bisect only requires __len__ and __getitem__ — it never checks if the object is an actual list.

Combining bisect with the `key` parameter

Before Python 3.10, the standard workaround for keyed bisect was maintaining a parallel keys list or using a wrapper. Post-3.10, the key parameter simplifies sorted containers of complex objects:

import bisect
from datetime import datetime

log_entries = [
    {"ts": datetime(2026, 1, 1, 10, 0), "msg": "start"},
    {"ts": datetime(2026, 1, 1, 12, 0), "msg": "checkpoint"},
    {"ts": datetime(2026, 1, 1, 14, 0), "msg": "end"},
]

target = datetime(2026, 1, 1, 11, 30)
idx = bisect.bisect_left(log_entries, target, key=lambda e: e["ts"])
# idx → 1, the entry just after 11:30 is "checkpoint" at 12:00

Important caveat: the key function is called on each element during the search, adding function-call overhead. For hot loops, pre-extracting keys into a separate list and bisecting that list can be 2-5x faster.

Thread safety and immutability

bisect_left and bisect_right are read-only and safe to call concurrently on a list that isn’t being modified. insort modifies the list and is not thread-safe — use a lock or process insertions in a single thread.

For concurrent workloads, a common architecture is to batch insertions and periodically merge them into the main sorted list, similar to an LSM-tree approach.

Pitfalls and edge cases

Unsorted input produces silent wrong results. bisect has no way to detect this. If your list might be unsorted, validate it or use a data structure that enforces order.
Float precision issues. bisect_left([0.1 + 0.2], 0.3) returns 1, not 0, because 0.1 + 0.2 != 0.3 in floating point. Use decimal.Decimal for exact comparisons.
Empty list edge case. Both bisect_left([], x) and bisect_right([], x) return 0, which is correct but means you always need a bounds check before accessing the returned index.
Mixing types. bisect_left([1, 2, 3], "a") raises TypeError in Python 3. Python 2 allowed cross-type comparisons, which led to subtle bugs.

The one thing to remember: bisect is a precision tool — O(log n) search that works on any sorted sequence and even on virtual sequences via __getitem__ — but it trusts you to provide sorted input and check bounds on the result.

pythonstandard-libraryalgorithms