Python CSV Processing — Core Concepts

Master practical CSV processing workflows in Python — reading, writing, filtering, transforming, and handling real-world data quirks.

While the Python csv module covers the standard library API, this article focuses on practical processing workflows — the patterns you actually use when working with CSV data in production.

Reading Patterns

Dictionary Reader (Most Common)

DictReader maps each row to a dictionary using the header row as keys:

import csv

with open("employees.csv", encoding="utf-8") as f:
    reader = csv.DictReader(f)
    for row in reader:
        print(f"{row['name']} works in {row['department']}")

This is preferred over plain csv.reader because column access by name is clearer and resistant to column reordering.

Handling Messy Headers

Real CSV files often have inconsistent headers:

with open("messy.csv", encoding="utf-8") as f:
    reader = csv.DictReader(f)
    # Normalize headers: strip whitespace, lowercase
    reader.fieldnames = [h.strip().lower().replace(" ", "_") 
                         for h in reader.fieldnames]
    for row in reader:
        print(row["first_name"])

Writing Patterns

Dictionary Writer

fieldnames = ["name", "age", "city"]
data = [
    {"name": "Alice", "age": 30, "city": "London"},
    {"name": "Bob", "age": 25, "city": "Tokyo"},
]

with open("output.csv", "w", encoding="utf-8", newline="") as f:
    writer = csv.DictWriter(f, fieldnames=fieldnames)
    writer.writeheader()
    writer.writerows(data)

The newline="" parameter is important on Windows — without it, you get extra blank lines between rows.

Filtering and Transforming

Filter Rows

with open("sales.csv", encoding="utf-8") as fin, \
     open("big_sales.csv", "w", encoding="utf-8", newline="") as fout:
    reader = csv.DictReader(fin)
    writer = csv.DictWriter(fout, fieldnames=reader.fieldnames)
    writer.writeheader()
    for row in reader:
        if float(row["amount"]) > 1000:
            writer.writerow(row)

Add Computed Columns

with open("products.csv", encoding="utf-8") as fin:
    reader = csv.DictReader(fin)
    out_fields = reader.fieldnames + ["total"]
    
    with open("products_total.csv", "w", encoding="utf-8", newline="") as fout:
        writer = csv.DictWriter(fout, fieldnames=out_fields)
        writer.writeheader()
        for row in reader:
            row["total"] = float(row["price"]) * int(row["quantity"])
            writer.writerow(row)

Real-World Data Quirks

Different Delimiters

Not all “CSV” files use commas. TSV (tab-separated) and semicolon-separated files are common:

# Tab-separated
reader = csv.DictReader(f, delimiter="\t")

# Semicolon (common in European locales where comma is decimal separator)
reader = csv.DictReader(f, delimiter=";")

Detecting the Dialect

with open("mystery.csv", encoding="utf-8") as f:
    sample = f.read(8192)
    dialect = csv.Sniffer().sniff(sample)
    f.seek(0)
    reader = csv.DictReader(f, dialect=dialect)

Encoding Issues

Many CSV files from older systems use Latin-1 or Windows-1252 instead of UTF-8:

# Try UTF-8 first, fall back
try:
    f = open("data.csv", encoding="utf-8")
    f.read()
    f.seek(0)
except UnicodeDecodeError:
    f = open("data.csv", encoding="cp1252")

BOM (Byte Order Mark)

Excel saves UTF-8 CSV files with a BOM (\ufeff) that pollutes the first header name. Use utf-8-sig encoding:

with open("excel_export.csv", encoding="utf-8-sig") as f:
    reader = csv.DictReader(f)
    # First fieldname is clean, no BOM character

Memory-Efficient Processing

CSV files can be huge. The csv module reads one row at a time, so memory usage stays constant regardless of file size:

total = 0
count = 0
with open("giant.csv", encoding="utf-8") as f:
    reader = csv.DictReader(f)
    for row in reader:
        total += float(row["value"])
        count += 1
print(f"Average: {total / count}")

For analytical work on large CSV files, consider pandas or polars — they offer columnar operations that are dramatically faster than row-by-row Python loops.

Common Misconception

“Just split on commas.” Splitting on commas breaks when fields contain commas inside quotes, when quotes need escaping, or when fields span multiple lines. The csv module handles all of these cases correctly. Never parse CSV by hand.

One Thing to Remember

Use DictReader and DictWriter for readable code, always specify encoding="utf-8" and newline="", and never split on commas manually — the csv module exists to handle the edge cases you’ll miss.

pythoncsvdata-processingtext-processing