Python copy Module — Deep Dive
The copy protocol
Python’s copy module uses a well-defined protocol to determine how objects are copied. Understanding this protocol lets you control copying behavior for custom classes.
How copy.copy() selects its strategy
The shallow copy function tries these approaches in order:
- Call
cls.__copy__(self)if defined - Call
cls.__reduce_ex__(4)(the pickle protocol) - Call
cls.__reduce__()as a fallback - For built-in types, use type-specific fast paths
How copy.deepcopy() works
Deep copy follows a similar but more complex chain:
- Call
cls.__deepcopy__(self, memo)if defined - Check the memo dict for already-copied objects (cycle detection)
- Fall through to
__reduce_ex__/__reduce__with recursive deep copying of components
The memo dictionary is critical — it maps id(original) to the copied object, preventing infinite recursion on circular structures and preserving shared-reference topology.
Customizing copy behavior
__copy__ for shallow copy
class Connection:
def __init__(self, host, port):
self.host = host
self.port = port
self._socket = None # expensive resource
def __copy__(self):
"""Shallow copy without the socket — caller must reconnect."""
new = Connection(self.host, self.port)
# deliberately not copying _socket
return new
__deepcopy__ for deep copy
import copy
class GameState:
def __init__(self, board, players, history):
self.board = board
self.players = players
self.history = history
self._cache = {} # transient, don't copy
def __deepcopy__(self, memo):
new = GameState(
copy.deepcopy(self.board, memo),
copy.deepcopy(self.players, memo),
copy.deepcopy(self.history, memo),
)
# _cache is intentionally left empty
new._cache = {}
memo[id(self)] = new
return new
The memo parameter must be passed to recursive deepcopy calls and updated with the new object. Forgetting to update memo breaks circular reference handling.
The memo dictionary in detail
import copy
data = {"key": [1, 2, 3]}
memo = {}
result = copy.deepcopy(data, memo)
# memo now maps id(data) -> result, id(data["key"]) -> result["key"]
print(len(memo)) # at least 2 entries
You can pre-populate memo to control copying:
import copy
shared_config = {"db_host": "localhost", "db_port": 5432}
# Pre-populate memo so deepcopy reuses this exact object instead of copying it
memo = {id(shared_config): shared_config}
state_copy = copy.deepcopy(app_state, memo)
# state_copy references the SAME shared_config object, not a copy
This technique is useful when certain objects should remain singletons even across deep copies — database connections, configuration objects, or caches.
Edge cases and pitfalls
Copying objects with __slots__
Classes using __slots__ work with copy, but you may need to handle them explicitly in __deepcopy__:
import copy
class Point:
__slots__ = ("x", "y")
def __init__(self, x, y):
self.x = x
self.y = y
p = Point(1, 2)
p2 = copy.copy(p) # works via __reduce__
p3 = copy.deepcopy(p) # also works
Copying file handles and sockets
These represent OS-level resources that can’t be meaningfully duplicated. deepcopy will raise TypeError for file objects. Always handle these in __deepcopy__:
import copy
class Logger:
def __init__(self, path):
self.path = path
self._file = open(path, "a")
def __deepcopy__(self, memo):
# Open a new file handle instead of copying the old one
new = Logger.__new__(Logger)
new.path = self.path
new._file = open(self.path, "a")
memo[id(self)] = new
return new
def __del__(self):
self._file.close()
Copying lambda and function references
Functions and lambdas are not copied — both copy and deepcopy return the same function object. This is almost always the right behavior, since functions are typically stateless.
import copy
fn = lambda x: x + 1
fn_copy = copy.deepcopy(fn)
assert fn is fn_copy # True — same object
Copying weakref
Weak references are handled specially — deepcopy creates a new weakref pointing to the deep-copied referent (if the referent is part of the copied graph).
Performance analysis
Deep copy performance depends on object graph size and complexity:
| Scenario | Approximate cost |
|---|---|
| Flat list of 1,000 ints | ~20μs (shallow) / ~200μs (deep) |
| Nested dict, 3 levels, 1,000 items | ~100μs (shallow) / ~2ms (deep) |
| Complex object graph, 10,000 nodes | N/A (shallow) / ~50ms (deep) |
| Object with 100 shared references | N/A / ~5ms (memo overhead) |
Optimization strategies
1. Avoid deepcopy in hot paths. If you need to copy data structures millions of times (e.g., game tree search), consider immutable data structures or manual copying of only what changes.
2. Use __deepcopy__ to skip transient fields. Caches, computed properties, and connection pools don’t need copying.
3. Structural sharing (copy-on-write). Instead of deep copying, share structure and only copy on mutation:
class ImmutableList:
"""List that returns new instances on 'modification'."""
def __init__(self, items):
self._items = tuple(items)
def append(self, item):
return ImmutableList(self._items + (item,))
def __getitem__(self, idx):
return self._items[idx]
4. Consider pickle round-trip for complex objects where custom __deepcopy__ is too tedious:
import pickle
def fast_deepcopy(obj):
return pickle.loads(pickle.dumps(obj, protocol=pickle.HIGHEST_PROTOCOL))
This is sometimes faster than deepcopy for simple objects because pickle’s C implementation is highly optimized. But it won’t work for objects that aren’t picklable.
copy and dataclasses
Dataclasses work with both copy and deepcopy out of the box:
import copy
from dataclasses import dataclass, field
@dataclass
class Team:
name: str
members: list = field(default_factory=list)
team = Team("Alpha", ["Alice", "Bob"])
team_copy = copy.deepcopy(team)
team_copy.members.append("Charlie")
print(team.members) # ["Alice", "Bob"] — original untouched
For frozen dataclasses, copying is technically unnecessary (the object is immutable), but copy still works — the shallow copy returns the same object for frozen immutable types.
Real-world patterns
Undo system
import copy
class Editor:
def __init__(self):
self.state = {"text": "", "cursor": 0}
self._undo_stack = []
def snapshot(self):
self._undo_stack.append(copy.deepcopy(self.state))
def undo(self):
if self._undo_stack:
self.state = self._undo_stack.pop()
Test fixture isolation
import copy, pytest
BASE_FIXTURE = {
"users": [{"id": 1, "name": "Test User"}],
"settings": {"theme": "dark", "lang": "en"},
}
@pytest.fixture
def test_data():
return copy.deepcopy(BASE_FIXTURE)
Configuration layering
import copy
defaults = {"timeout": 30, "retries": 3, "headers": {"User-Agent": "MyApp/1.0"}}
def make_config(**overrides):
config = copy.deepcopy(defaults)
config.update(overrides)
return config
One thing to remember
The copy module is fundamentally about controlling reference sharing. Master the __copy__/__deepcopy__ protocol for custom classes, use memo pre-population to preserve singletons, and prefer immutable structures or structural sharing when deep copy performance becomes a bottleneck.
See Also
- Python Atexit How Python's atexit module lets your program clean up after itself right before it shuts down.
- Python Bisect Sorted Lists How Python's bisect module finds things in sorted lists the way you'd find a word in a dictionary — by jumping to the middle.
- Python Contextlib How Python's contextlib module makes the 'with' statement work for anything, not just files.
- Python Dataclass Field Metadata How Python dataclass fields can carry hidden notes — like sticky notes on a filing cabinet that tools read automatically.
- Python Datetime Handling Why dealing with dates and times in Python is trickier than it sounds — and how the datetime module tames the chaos