Python copy Module — Deep Dive

The copy protocol

Python’s copy module uses a well-defined protocol to determine how objects are copied. Understanding this protocol lets you control copying behavior for custom classes.

How copy.copy() selects its strategy

The shallow copy function tries these approaches in order:

  1. Call cls.__copy__(self) if defined
  2. Call cls.__reduce_ex__(4) (the pickle protocol)
  3. Call cls.__reduce__() as a fallback
  4. For built-in types, use type-specific fast paths

How copy.deepcopy() works

Deep copy follows a similar but more complex chain:

  1. Call cls.__deepcopy__(self, memo) if defined
  2. Check the memo dict for already-copied objects (cycle detection)
  3. Fall through to __reduce_ex__ / __reduce__ with recursive deep copying of components

The memo dictionary is critical — it maps id(original) to the copied object, preventing infinite recursion on circular structures and preserving shared-reference topology.

Customizing copy behavior

__copy__ for shallow copy

class Connection:
    def __init__(self, host, port):
        self.host = host
        self.port = port
        self._socket = None  # expensive resource

    def __copy__(self):
        """Shallow copy without the socket — caller must reconnect."""
        new = Connection(self.host, self.port)
        # deliberately not copying _socket
        return new

__deepcopy__ for deep copy

import copy

class GameState:
    def __init__(self, board, players, history):
        self.board = board
        self.players = players
        self.history = history
        self._cache = {}  # transient, don't copy

    def __deepcopy__(self, memo):
        new = GameState(
            copy.deepcopy(self.board, memo),
            copy.deepcopy(self.players, memo),
            copy.deepcopy(self.history, memo),
        )
        # _cache is intentionally left empty
        new._cache = {}
        memo[id(self)] = new
        return new

The memo parameter must be passed to recursive deepcopy calls and updated with the new object. Forgetting to update memo breaks circular reference handling.

The memo dictionary in detail

import copy

data = {"key": [1, 2, 3]}
memo = {}
result = copy.deepcopy(data, memo)

# memo now maps id(data) -> result, id(data["key"]) -> result["key"]
print(len(memo))  # at least 2 entries

You can pre-populate memo to control copying:

import copy

shared_config = {"db_host": "localhost", "db_port": 5432}

# Pre-populate memo so deepcopy reuses this exact object instead of copying it
memo = {id(shared_config): shared_config}
state_copy = copy.deepcopy(app_state, memo)
# state_copy references the SAME shared_config object, not a copy

This technique is useful when certain objects should remain singletons even across deep copies — database connections, configuration objects, or caches.

Edge cases and pitfalls

Copying objects with __slots__

Classes using __slots__ work with copy, but you may need to handle them explicitly in __deepcopy__:

import copy

class Point:
    __slots__ = ("x", "y")

    def __init__(self, x, y):
        self.x = x
        self.y = y

p = Point(1, 2)
p2 = copy.copy(p)     # works via __reduce__
p3 = copy.deepcopy(p)  # also works

Copying file handles and sockets

These represent OS-level resources that can’t be meaningfully duplicated. deepcopy will raise TypeError for file objects. Always handle these in __deepcopy__:

import copy

class Logger:
    def __init__(self, path):
        self.path = path
        self._file = open(path, "a")

    def __deepcopy__(self, memo):
        # Open a new file handle instead of copying the old one
        new = Logger.__new__(Logger)
        new.path = self.path
        new._file = open(self.path, "a")
        memo[id(self)] = new
        return new

    def __del__(self):
        self._file.close()

Copying lambda and function references

Functions and lambdas are not copied — both copy and deepcopy return the same function object. This is almost always the right behavior, since functions are typically stateless.

import copy

fn = lambda x: x + 1
fn_copy = copy.deepcopy(fn)
assert fn is fn_copy  # True — same object

Copying weakref

Weak references are handled specially — deepcopy creates a new weakref pointing to the deep-copied referent (if the referent is part of the copied graph).

Performance analysis

Deep copy performance depends on object graph size and complexity:

ScenarioApproximate cost
Flat list of 1,000 ints~20μs (shallow) / ~200μs (deep)
Nested dict, 3 levels, 1,000 items~100μs (shallow) / ~2ms (deep)
Complex object graph, 10,000 nodesN/A (shallow) / ~50ms (deep)
Object with 100 shared referencesN/A / ~5ms (memo overhead)

Optimization strategies

1. Avoid deepcopy in hot paths. If you need to copy data structures millions of times (e.g., game tree search), consider immutable data structures or manual copying of only what changes.

2. Use __deepcopy__ to skip transient fields. Caches, computed properties, and connection pools don’t need copying.

3. Structural sharing (copy-on-write). Instead of deep copying, share structure and only copy on mutation:

class ImmutableList:
    """List that returns new instances on 'modification'."""
    def __init__(self, items):
        self._items = tuple(items)

    def append(self, item):
        return ImmutableList(self._items + (item,))

    def __getitem__(self, idx):
        return self._items[idx]

4. Consider pickle round-trip for complex objects where custom __deepcopy__ is too tedious:

import pickle

def fast_deepcopy(obj):
    return pickle.loads(pickle.dumps(obj, protocol=pickle.HIGHEST_PROTOCOL))

This is sometimes faster than deepcopy for simple objects because pickle’s C implementation is highly optimized. But it won’t work for objects that aren’t picklable.

copy and dataclasses

Dataclasses work with both copy and deepcopy out of the box:

import copy
from dataclasses import dataclass, field

@dataclass
class Team:
    name: str
    members: list = field(default_factory=list)

team = Team("Alpha", ["Alice", "Bob"])
team_copy = copy.deepcopy(team)
team_copy.members.append("Charlie")
print(team.members)  # ["Alice", "Bob"] — original untouched

For frozen dataclasses, copying is technically unnecessary (the object is immutable), but copy still works — the shallow copy returns the same object for frozen immutable types.

Real-world patterns

Undo system

import copy

class Editor:
    def __init__(self):
        self.state = {"text": "", "cursor": 0}
        self._undo_stack = []

    def snapshot(self):
        self._undo_stack.append(copy.deepcopy(self.state))

    def undo(self):
        if self._undo_stack:
            self.state = self._undo_stack.pop()

Test fixture isolation

import copy, pytest

BASE_FIXTURE = {
    "users": [{"id": 1, "name": "Test User"}],
    "settings": {"theme": "dark", "lang": "en"},
}

@pytest.fixture
def test_data():
    return copy.deepcopy(BASE_FIXTURE)

Configuration layering

import copy

defaults = {"timeout": 30, "retries": 3, "headers": {"User-Agent": "MyApp/1.0"}}

def make_config(**overrides):
    config = copy.deepcopy(defaults)
    config.update(overrides)
    return config

One thing to remember

The copy module is fundamentally about controlling reference sharing. Master the __copy__/__deepcopy__ protocol for custom classes, use memo pre-population to preserve singletons, and prefer immutable structures or structural sharing when deep copy performance becomes a bottleneck.

pythonstandard-librarydata-handling

See Also

  • Python Atexit How Python's atexit module lets your program clean up after itself right before it shuts down.
  • Python Bisect Sorted Lists How Python's bisect module finds things in sorted lists the way you'd find a word in a dictionary — by jumping to the middle.
  • Python Contextlib How Python's contextlib module makes the 'with' statement work for anything, not just files.
  • Python Dataclass Field Metadata How Python dataclass fields can carry hidden notes — like sticky notes on a filing cabinet that tools read automatically.
  • Python Datetime Handling Why dealing with dates and times in Python is trickier than it sounds — and how the datetime module tames the chaos