Python CSV Processing — Core Concepts
While the Python csv module covers the standard library API, this article focuses on practical processing workflows — the patterns you actually use when working with CSV data in production.
Reading Patterns
Dictionary Reader (Most Common)
DictReader maps each row to a dictionary using the header row as keys:
import csv
with open("employees.csv", encoding="utf-8") as f:
reader = csv.DictReader(f)
for row in reader:
print(f"{row['name']} works in {row['department']}")
This is preferred over plain csv.reader because column access by name is clearer and resistant to column reordering.
Handling Messy Headers
Real CSV files often have inconsistent headers:
with open("messy.csv", encoding="utf-8") as f:
reader = csv.DictReader(f)
# Normalize headers: strip whitespace, lowercase
reader.fieldnames = [h.strip().lower().replace(" ", "_")
for h in reader.fieldnames]
for row in reader:
print(row["first_name"])
Writing Patterns
Dictionary Writer
fieldnames = ["name", "age", "city"]
data = [
{"name": "Alice", "age": 30, "city": "London"},
{"name": "Bob", "age": 25, "city": "Tokyo"},
]
with open("output.csv", "w", encoding="utf-8", newline="") as f:
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
writer.writerows(data)
The newline="" parameter is important on Windows — without it, you get extra blank lines between rows.
Filtering and Transforming
Filter Rows
with open("sales.csv", encoding="utf-8") as fin, \
open("big_sales.csv", "w", encoding="utf-8", newline="") as fout:
reader = csv.DictReader(fin)
writer = csv.DictWriter(fout, fieldnames=reader.fieldnames)
writer.writeheader()
for row in reader:
if float(row["amount"]) > 1000:
writer.writerow(row)
Add Computed Columns
with open("products.csv", encoding="utf-8") as fin:
reader = csv.DictReader(fin)
out_fields = reader.fieldnames + ["total"]
with open("products_total.csv", "w", encoding="utf-8", newline="") as fout:
writer = csv.DictWriter(fout, fieldnames=out_fields)
writer.writeheader()
for row in reader:
row["total"] = float(row["price"]) * int(row["quantity"])
writer.writerow(row)
Real-World Data Quirks
Different Delimiters
Not all “CSV” files use commas. TSV (tab-separated) and semicolon-separated files are common:
# Tab-separated
reader = csv.DictReader(f, delimiter="\t")
# Semicolon (common in European locales where comma is decimal separator)
reader = csv.DictReader(f, delimiter=";")
Detecting the Dialect
with open("mystery.csv", encoding="utf-8") as f:
sample = f.read(8192)
dialect = csv.Sniffer().sniff(sample)
f.seek(0)
reader = csv.DictReader(f, dialect=dialect)
Encoding Issues
Many CSV files from older systems use Latin-1 or Windows-1252 instead of UTF-8:
# Try UTF-8 first, fall back
try:
f = open("data.csv", encoding="utf-8")
f.read()
f.seek(0)
except UnicodeDecodeError:
f = open("data.csv", encoding="cp1252")
BOM (Byte Order Mark)
Excel saves UTF-8 CSV files with a BOM (\ufeff) that pollutes the first header name. Use utf-8-sig encoding:
with open("excel_export.csv", encoding="utf-8-sig") as f:
reader = csv.DictReader(f)
# First fieldname is clean, no BOM character
Memory-Efficient Processing
CSV files can be huge. The csv module reads one row at a time, so memory usage stays constant regardless of file size:
total = 0
count = 0
with open("giant.csv", encoding="utf-8") as f:
reader = csv.DictReader(f)
for row in reader:
total += float(row["value"])
count += 1
print(f"Average: {total / count}")
For analytical work on large CSV files, consider pandas or polars — they offer columnar operations that are dramatically faster than row-by-row Python loops.
Common Misconception
“Just split on commas.” Splitting on commas breaks when fields contain commas inside quotes, when quotes need escaping, or when fields span multiple lines. The csv module handles all of these cases correctly. Never parse CSV by hand.
One Thing to Remember
Use DictReader and DictWriter for readable code, always specify encoding="utf-8" and newline="", and never split on commas manually — the csv module exists to handle the edge cases you’ll miss.
See Also
- Python Json Handling See how Python talks to the rest of the internet using JSON — the universal language apps use to share information.
- Python Template Strings See how Python's Template strings let you fill in blanks safely, like a Mad Libs game that can't go wrong.
- Python Toml Configuration Discover TOML — the config file format Python chose for its own projects, designed to be obvious and impossible to mess up.
- Ci Cd Why big apps can ship updates every day without turning your phone into a glitchy mess — CI/CD is the behind-the-scenes quality gate and delivery truck.
- Containerization Why does software that works on your computer break on everyone else's? Containers fix that — and they're why Netflix can deploy 100 updates a day without the site going down.