Python Data Structures — Core Concepts
The Four Core Data Structures
Python ships with four built-in data structures that cover the vast majority of use cases. Understanding their characteristics — especially their performance profiles — makes the difference between code that works and code that scales.
Lists
Lists are ordered, mutable sequences of any objects:
items = [1, "hello", 3.14, [1, 2]] # Mixed types are fine
Indexing and Slicing
lst = [10, 20, 30, 40, 50]
lst[0] # 10 (first)
lst[-1] # 50 (last)
lst[1:3] # [20, 30] (index 1 up to but not including 3)
lst[::2] # [10, 30, 50] (every 2nd element)
lst[::-1] # [50, 40, 30, 20, 10] (reversed)
Common Operations
lst.append(60) # Add to end — O(1)
lst.insert(0, 5) # Insert at position — O(n)
lst.pop() # Remove and return last — O(1)
lst.pop(0) # Remove and return first — O(n)
lst.remove(30) # Remove first occurrence of value — O(n)
30 in lst # Membership test — O(n)
len(lst) # Length — O(1)
Performance note: pop(0) and insert(0, x) are O(n) — they shift every element. For fast additions/removals at both ends, use collections.deque instead.
List Comprehensions
The Pythonic way to create lists from other sequences:
squares = [x**2 for x in range(10)]
# [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
evens = [x for x in range(20) if x % 2 == 0]
# [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
flat = [item for sublist in [[1,2], [3,4], [5,6]] for item in sublist]
# [1, 2, 3, 4, 5, 6]
Comprehensions are faster than equivalent for loops (the iteration is implemented in C) and more readable for simple transformations.
Dictionaries
Dicts map keys to values. Since Python 3.7, they preserve insertion order:
user = {
"name": "Alice",
"age": 30,
"email": "alice@example.com"
}
Access Patterns
user["name"] # "Alice" — raises KeyError if missing
user.get("phone") # None — safe, no error if missing
user.get("phone", "N/A") # "N/A" — default value
user["city"] = "London" # Add or update
del user["email"] # Remove key
"name" in user # True — membership test O(1)
Iterating
for key in user: # Keys only
print(key)
for value in user.values(): # Values only
print(value)
for key, value in user.items(): # Both
print(f"{key}: {value}")
Dict Comprehensions
words = ["hello", "world", "python"]
lengths = {word: len(word) for word in words}
# {'hello': 5, 'world': 5, 'python': 6}
Useful Patterns
from collections import defaultdict, Counter
# defaultdict: no KeyError for missing keys
word_count = defaultdict(int)
for word in text.split():
word_count[word] += 1
# Counter: frequency counting
from collections import Counter
counts = Counter(["a", "b", "a", "c", "a", "b"])
print(counts.most_common(2)) # [('a', 3), ('b', 2)]
Tuples
Immutable sequences. Like lists, but frozen:
point = (3, 4)
rgb = (255, 128, 0)
Why Tuples Over Lists?
- Semantic signal — a tuple says “these values belong together and won’t change”
- Hashable — tuples can be dict keys or set members; lists can’t
- Slightly faster — tuple creation and access is marginally faster than lists
- Pattern matching — tuple unpacking is very Pythonic
# Unpacking
x, y = (3, 4)
first, *rest = (1, 2, 3, 4, 5) # first=1, rest=[2,3,4,5]
# As dict keys (lists can't do this)
distances = {(0, 0): 0, (1, 0): 1, (0, 1): 1}
# Named tuples: tuples with named fields
from collections import namedtuple
Point = namedtuple('Point', ['x', 'y'])
p = Point(3, 4)
print(p.x, p.y) # 3 4
Sets
Unordered collections of unique items:
tags = {"python", "programming", "coding"}
tags.add("python") # No effect — already present
Key Uses
# Deduplication
data = [1, 2, 2, 3, 3, 3, 4]
unique = list(set(data)) # [1, 2, 3, 4] — order not guaranteed
# Membership testing (O(1) vs O(n) for lists)
valid_codes = {"USD", "EUR", "GBP", "JPY"}
if currency in valid_codes: # Very fast for large sets
process(currency)
# Set operations
a = {1, 2, 3, 4}
b = {3, 4, 5, 6}
a | b # Union: {1, 2, 3, 4, 5, 6}
a & b # Intersection: {3, 4}
a - b # Difference: {1, 2}
a ^ b # Symmetric difference: {1, 2, 5, 6}
For an immutable set (hashable, usable as dict key), use frozenset.
Choosing the Right Structure
The key question is usually about lookup speed:
- Need to check membership? Set or dict keys — O(1). List — O(n).
- Need ordered sequence? List or tuple.
- Need key-value mapping? Dict.
- Need to prevent duplicates? Set.
- Data should be immutable? Tuple or frozenset.
For large datasets, the difference between O(1) and O(n) is massive. Checking if a string is in a set of 1 million items is equally fast as checking a set of 10 items. Checking a list scales linearly — 1 million items is 100,000x slower than 10 items.
Common Misconception: Dicts Are Unordered
Pre-Python 3.7, dicts were genuinely unordered. Since Python 3.7, they preserve insertion order as an implementation guarantee. But sets are still unordered — don’t iterate a set expecting a consistent order.
One Thing to Remember
The biggest performance decision in Python often comes down to sets vs. lists for membership testing —
x in my_setis O(1),x in my_listis O(n). When checking membership repeatedly over large collections, always use a set or dict.
See Also
- Python Async Await Async/await helps one Python program juggle many waiting jobs at once, like a chef who keeps multiple pots moving without standing still.
- Python Basics Python is the programming language that reads like plain English — here's why millions of beginners (and experts) choose it first.
- Python Booleans Make Booleans click with one clear analogy you can reuse whenever Python feels confusing.
- Python Break Continue Make Break Continue click with one clear analogy you can reuse whenever Python feels confusing.
- Python Closures See how Python functions can remember private information, even after the outer function has already finished.