Binary Search Implementation — Deep Dive
Technical framing
Binary search is deceptively simple in concept but notoriously tricky to implement correctly. A study by Jon Bentley found that 90% of professional programmers could not write a bug-free binary search. The most common errors: off-by-one mistakes in boundary conditions, integer overflow in midpoint calculation, and incorrect handling of duplicates.
Python sidesteps the integer overflow issue (arbitrary precision integers), but the boundary bugs remain. Understanding the invariants behind binary search is essential for writing correct code.
The bisect module internals
CPython’s bisect module is implemented in C (Modules/_bisectmodule.c). The core logic for bisect_left:
while (lo < hi) {
mid = ((unsigned long)lo + hi) / 2;
if (compare(item, PyList_GET_ITEM(a, mid)) > 0)
lo = mid + 1;
else
hi = mid;
}
Key design decisions:
lo < hinotlo <= hi— the half-open interval[lo, hi)convention simplifies the logic and avoids off-by-one errors.hi = midnothi = mid - 1— consistent with the half-open interval.- Single comparison direction —
bisect_leftonly asks “is target greater than mid?” This avoids the three-way comparison that causes most implementation bugs.
Since Python 3.10, bisect functions accept a key parameter:
import bisect
data = [{"name": "alice", "score": 70}, {"name": "bob", "score": 85}, {"name": "carol", "score": 92}]
scores = [d["score"] for d in data]
idx = bisect.bisect_left(scores, 85)
# idx = 1, pointing to bob
With key (Python 3.10+):
# Avoids building a separate list
idx = bisect.bisect_left(data, 85, key=lambda d: d["score"])
Correct implementation patterns
Finding the first element satisfying a condition
The most general form of binary search finds the boundary where a predicate flips from False to True:
def first_true(lo: int, hi: int, predicate) -> int:
"""Find smallest i in [lo, hi) where predicate(i) is True.
Assumes predicate is monotonic: once True, stays True.
Returns hi if no element satisfies the predicate.
"""
while lo < hi:
mid = (lo + hi) // 2
if predicate(mid):
hi = mid
else:
lo = mid + 1
return lo
This pattern eliminates most off-by-one bugs because the invariant is clear: everything in [original_lo, lo) is False, everything in [hi, original_hi) is True.
Lower bound and upper bound
def lower_bound(arr: list, target) -> int:
"""Index of first element >= target."""
return first_true(0, len(arr), lambda i: arr[i] >= target)
def upper_bound(arr: list, target) -> int:
"""Index of first element > target."""
return first_true(0, len(arr), lambda i: arr[i] > target)
def count_occurrences(arr: list, target) -> int:
return upper_bound(arr, target) - lower_bound(arr, target)
Search in a rotated sorted array
A sorted array rotated at an unknown pivot (e.g., [4, 5, 6, 7, 0, 1, 2]):
def search_rotated(arr: list, target) -> int:
lo, hi = 0, len(arr) - 1
while lo <= hi:
mid = (lo + hi) // 2
if arr[mid] == target:
return mid
# Left half is sorted
if arr[lo] <= arr[mid]:
if arr[lo] <= target < arr[mid]:
hi = mid - 1
else:
lo = mid + 1
# Right half is sorted
else:
if arr[mid] < target <= arr[hi]:
lo = mid + 1
else:
hi = mid - 1
return -1
The key insight: at least one half of the array is always properly sorted. Check if the target falls in the sorted half; if not, search the other half.
Binary search on the answer
Many optimization problems can be solved by binary searching over the answer space. The pattern: “is X achievable?” becomes a monotonic predicate.
Example: minimum capacity to ship packages in D days
def ship_within_days(weights: list[int], days: int) -> int:
def can_ship(capacity: int) -> bool:
current_load = 0
required_days = 1
for w in weights:
if current_load + w > capacity:
required_days += 1
current_load = 0
current_load += w
return required_days <= days
lo = max(weights) # must fit the heaviest package
hi = sum(weights) # ship everything in one day
while lo < hi:
mid = (lo + hi) // 2
if can_ship(mid):
hi = mid
else:
lo = mid + 1
return lo
Example: find square root to arbitrary precision
def sqrt_binary_search(n: float, precision: float = 1e-10) -> float:
if n < 0:
raise ValueError("Cannot compute square root of negative number")
if n < 1:
lo, hi = n, 1.0
else:
lo, hi = 1.0, n
while hi - lo > precision:
mid = (lo + hi) / 2
if mid * mid < n:
lo = mid
else:
hi = mid
return (lo + hi) / 2
When binary searching over real numbers, use an absolute or relative tolerance to terminate, or fix the number of iterations (50 iterations gives precision of ~2⁻⁵⁰).
Performance analysis
Comparison count
Binary search performs at most ⌈log₂(n)⌉ + 1 comparisons. For bisect_left, the exact count is ⌈log₂(n + 1)⌉ because it uses only one comparison per iteration.
Cache behavior
Binary search has poor cache locality for large arrays — it jumps to widely separated memory locations. For arrays that fit in L1/L2 cache (<~256KB), this is irrelevant. For larger arrays, techniques like Eytzinger layout (store the array in BFS order of a complete binary tree) can improve cache performance by 2-3x, though this is rarely needed in Python due to interpreter overhead.
When linear search wins
For very small arrays (< ~64 elements), linear search is faster than binary search because:
- No division operation for midpoint
- Sequential memory access patterns
- Branch prediction is more effective on sequential scans
Python’s bisect module does not fall back to linear search for small arrays, but the difference is negligible at that scale.
Maintaining sorted collections
For data that changes frequently, repeatedly sorting after each insertion is O(n log n) per insert. Alternatives:
bisect.insort— O(n) per insert due to list shifting, but with C-speed overhead. Good for moderate sizes.sortedcontainers.SortedList— a third-party library using a B-tree-like structure. O(log n) search and O(√n) amortized insert. The best option for dynamic sorted data in Python.heapq— maintains a min-heap. O(log n) insert and O(1) min access, but no efficient arbitrary search.
from sortedcontainers import SortedList
sl = SortedList([1, 3, 5, 7, 9])
sl.add(4) # O(log n)
sl.remove(3) # O(log n)
idx = sl.bisect_left(5) # O(log n)
Common pitfalls
- Forgetting that the array must be sorted — binary search on unsorted data gives wrong results silently.
- Using
<=vs<in the while condition —while lo <= hifor closed intervals,while lo < hifor half-open. Mixing them causes infinite loops or missed elements. - Not handling empty arrays — always check
len(arr) == 0or ensure your bounds handle it. - Floating-point binary search without a fixed iteration limit — rounding errors can cause infinite loops. Use a fixed number of iterations (e.g., 100) as a safety net.
One thing to remember: The safest binary search implementation uses half-open intervals [lo, hi) with a monotonic predicate. This pattern generalizes to almost every binary search variant and eliminates the off-by-one errors that plague closed-interval implementations.
See Also
- Python Backtracking Algorithms How computers solve puzzles by trying every path, backing up when stuck, and never giving up until they find the answer.
- Python Big O Complexity Analysis Why some programs finish in a blink while others take forever — explained with pizza delivery and toy cleanup.
- Python Dynamic Programming The clever trick of remembering answers you already figured out so you never solve the same puzzle twice.
- Python Graph Algorithms How computers navigate maps, friendships, and connections using dots and lines — explained without any math.
- Python Greedy Algorithms Why always grabbing the biggest piece sometimes gives you the best answer — and sometimes does not.