Python Data Structures — Core Concepts

Python's built-in data structures — list, dict, tuple, set — and how to choose between them, use comprehensions, and avoid the common pitfalls.

The Four Core Data Structures

Python ships with four built-in data structures that cover the vast majority of use cases. Understanding their characteristics — especially their performance profiles — makes the difference between code that works and code that scales.

Lists

Lists are ordered, mutable sequences of any objects:

items = [1, "hello", 3.14, [1, 2]]   # Mixed types are fine

Indexing and Slicing

lst = [10, 20, 30, 40, 50]
lst[0]     # 10 (first)
lst[-1]    # 50 (last)
lst[1:3]   # [20, 30] (index 1 up to but not including 3)
lst[::2]   # [10, 30, 50] (every 2nd element)
lst[::-1]  # [50, 40, 30, 20, 10] (reversed)

Common Operations

lst.append(60)          # Add to end — O(1)
lst.insert(0, 5)        # Insert at position — O(n)
lst.pop()               # Remove and return last — O(1)
lst.pop(0)              # Remove and return first — O(n)
lst.remove(30)          # Remove first occurrence of value — O(n)
30 in lst               # Membership test — O(n)
len(lst)                # Length — O(1)

Performance note: pop(0) and insert(0, x) are O(n) — they shift every element. For fast additions/removals at both ends, use collections.deque instead.

List Comprehensions

The Pythonic way to create lists from other sequences:

squares = [x**2 for x in range(10)]
# [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

evens = [x for x in range(20) if x % 2 == 0]
# [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

flat = [item for sublist in [[1,2], [3,4], [5,6]] for item in sublist]
# [1, 2, 3, 4, 5, 6]

Comprehensions are faster than equivalent for loops (the iteration is implemented in C) and more readable for simple transformations.

Dictionaries

Dicts map keys to values. Since Python 3.7, they preserve insertion order:

user = {
    "name": "Alice",
    "age": 30,
    "email": "alice@example.com"
}

Access Patterns

user["name"]                 # "Alice" — raises KeyError if missing
user.get("phone")            # None — safe, no error if missing
user.get("phone", "N/A")     # "N/A" — default value
user["city"] = "London"      # Add or update
del user["email"]            # Remove key
"name" in user               # True — membership test O(1)

Iterating

for key in user:              # Keys only
    print(key)

for value in user.values():   # Values only
    print(value)

for key, value in user.items():  # Both
    print(f"{key}: {value}")

Dict Comprehensions

words = ["hello", "world", "python"]
lengths = {word: len(word) for word in words}
# {'hello': 5, 'world': 5, 'python': 6}

Useful Patterns

from collections import defaultdict, Counter

# defaultdict: no KeyError for missing keys
word_count = defaultdict(int)
for word in text.split():
    word_count[word] += 1

# Counter: frequency counting
from collections import Counter
counts = Counter(["a", "b", "a", "c", "a", "b"])
print(counts.most_common(2))   # [('a', 3), ('b', 2)]

Tuples

Immutable sequences. Like lists, but frozen:

point = (3, 4)
rgb = (255, 128, 0)

Why Tuples Over Lists?

Semantic signal — a tuple says “these values belong together and won’t change”
Hashable — tuples can be dict keys or set members; lists can’t
Slightly faster — tuple creation and access is marginally faster than lists
Pattern matching — tuple unpacking is very Pythonic

# Unpacking
x, y = (3, 4)
first, *rest = (1, 2, 3, 4, 5)   # first=1, rest=[2,3,4,5]

# As dict keys (lists can't do this)
distances = {(0, 0): 0, (1, 0): 1, (0, 1): 1}

# Named tuples: tuples with named fields
from collections import namedtuple
Point = namedtuple('Point', ['x', 'y'])
p = Point(3, 4)
print(p.x, p.y)   # 3 4

Sets

Unordered collections of unique items:

tags = {"python", "programming", "coding"}
tags.add("python")   # No effect — already present

Key Uses

# Deduplication
data = [1, 2, 2, 3, 3, 3, 4]
unique = list(set(data))   # [1, 2, 3, 4] — order not guaranteed

# Membership testing (O(1) vs O(n) for lists)
valid_codes = {"USD", "EUR", "GBP", "JPY"}
if currency in valid_codes:   # Very fast for large sets
    process(currency)

# Set operations
a = {1, 2, 3, 4}
b = {3, 4, 5, 6}
a | b    # Union: {1, 2, 3, 4, 5, 6}
a & b    # Intersection: {3, 4}
a - b    # Difference: {1, 2}
a ^ b    # Symmetric difference: {1, 2, 5, 6}

For an immutable set (hashable, usable as dict key), use frozenset.

Choosing the Right Structure

The key question is usually about lookup speed:

Need to check membership? Set or dict keys — O(1). List — O(n).
Need ordered sequence? List or tuple.
Need key-value mapping? Dict.
Need to prevent duplicates? Set.
Data should be immutable? Tuple or frozenset.

For large datasets, the difference between O(1) and O(n) is massive. Checking if a string is in a set of 1 million items is equally fast as checking a set of 10 items. Checking a list scales linearly — 1 million items is 100,000x slower than 10 items.

Common Misconception: Dicts Are Unordered

Pre-Python 3.7, dicts were genuinely unordered. Since Python 3.7, they preserve insertion order as an implementation guarantee. But sets are still unordered — don’t iterate a set expecting a consistent order.

One Thing to Remember

The biggest performance decision in Python often comes down to sets vs. lists for membership testing — x in my_set is O(1), x in my_list is O(n). When checking membership repeatedly over large collections, always use a set or dict.

pythondata-structureslistsdictionariessetstuplescomprehensions