Python MIDI Processing — Deep Dive

MIDI binary format

A Standard MIDI File (SMF) is a binary format with two chunk types:

Header chunk (MThd): 14 bytes total — magic bytes MThd, chunk length (6), format type (0/1/2), number of tracks, and timing division. The division field encodes either ticks-per-quarter-note (most common) or SMPTE time code.

Track chunks (MTrk): Variable length. Each contains a stream of events preceded by variable-length delta times. Variable-length quantities use 7 bits per byte with the high bit as a continuation flag — values 0–127 fit in one byte, larger values use up to four bytes.

# Reading variable-length quantity
def read_vlq(data, offset):
    value = 0
    while True:
        byte = data[offset]
        offset += 1
        value = (value << 7) | (byte & 0x7F)
        if not (byte & 0x80):
            break
    return value, offset

Running status

MIDI uses running status to compress sequential messages of the same type. If the status byte hasn’t changed, it can be omitted. Parsers must track the last status byte and re-apply it when the next byte is a data byte (< 0x80). Mido handles this transparently.

Real-time MIDI I/O

Mido supports real-time MIDI communication through backend ports (rtmidi, portmidi, or pygame):

import mido

# List available ports
print(mido.get_input_names())
print(mido.get_output_names())

# Read from a MIDI controller
with mido.open_input("USB MIDI Controller") as inport:
    for msg in inport:
        print(msg)
        if msg.type == "note_on":
            # Trigger action on note press
            process_note(msg.note, msg.velocity)

Sending MIDI to a synthesizer

with mido.open_output("Virtual MIDI Synth") as outport:
    outport.send(mido.Message("note_on", note=60, velocity=100))
    time.sleep(0.5)
    outport.send(mido.Message("note_off", note=60, velocity=0))

For virtual MIDI ports (loopback between applications), use mido.open_input(virtual=True) on Linux/macOS. Windows requires a third-party virtual MIDI driver.

Callback-based input

def handle_message(msg):
    if msg.type == "control_change" and msg.control == 1:
        # Modulation wheel
        update_parameter(msg.value / 127.0)

with mido.open_input("Controller", callback=handle_message):
    while True:
        time.sleep(0.1)

Piano roll representation

A piano roll is a 2-D matrix: 128 rows (MIDI pitches 0–127) × T time steps. Each cell contains velocity (or binary on/off). This representation is standard for ML music models.

import pretty_midi
import numpy as np

pm = pretty_midi.PrettyMIDI("song.mid")
# 100 Hz resolution (10 ms per frame)
piano_roll = pm.get_piano_roll(fs=100)  # shape: (128, T)

# Binarize for simpler models
binary_roll = (piano_roll > 0).astype(np.float32)

# Segment into fixed-length windows for training
window_size = 100  # 1 second at 100 Hz
windows = [binary_roll[:, i:i+window_size] 
           for i in range(0, binary_roll.shape[1] - window_size, window_size)]

Reconstructing MIDI from piano roll

def piano_roll_to_midi(roll: np.ndarray, fs: int = 100, program: int = 0) -> pretty_midi.PrettyMIDI:
    pm = pretty_midi.PrettyMIDI()
    instrument = pretty_midi.Instrument(program=program)
    
    for pitch in range(128):
        note_on = None
        for t in range(roll.shape[1]):
            if roll[pitch, t] > 0 and note_on is None:
                note_on = t / fs
            elif roll[pitch, t] == 0 and note_on is not None:
                instrument.notes.append(
                    pretty_midi.Note(velocity=100, pitch=pitch,
                                     start=note_on, end=t / fs))
                note_on = None
        if note_on is not None:
            instrument.notes.append(
                pretty_midi.Note(velocity=100, pitch=pitch,
                                 start=note_on, end=roll.shape[1] / fs))
    
    pm.instruments.append(instrument)
    return pm

Tempo and timing deep dive

MIDI timing involves two layers:

  1. Ticks: Abstract units. Resolution defined in the header (e.g., 480 ticks per quarter note).
  2. Tempo: A meta-message specifying microseconds per quarter note (default 500 000 = 120 BPM).

Converting ticks to seconds: seconds = ticks × (tempo_μs / division) / 1_000_000

Tempo can change mid-file. To build an accurate tick-to-time mapping, accumulate tempo changes chronologically:

def build_tick_to_time_map(midi_file):
    tempo = 500000  # default
    ticks_per_beat = midi_file.ticks_per_beat
    current_tick = 0
    current_time = 0.0
    mapping = [(0, 0.0)]
    
    for msg in mido.merge_tracks(midi_file.tracks):
        current_tick += msg.time
        dt = msg.time * (tempo / ticks_per_beat) / 1_000_000
        current_time += dt
        if msg.type == "set_tempo":
            tempo = msg.tempo
            mapping.append((current_tick, current_time))
    
    return mapping

Algorithmic composition

Markov chain melody generator

from collections import defaultdict
import random

def train_markov(notes, order=2):
    transitions = defaultdict(list)
    for i in range(len(notes) - order):
        state = tuple(n.pitch for n in notes[i:i+order])
        next_note = notes[i+order].pitch
        transitions[state].append(next_note)
    return transitions

def generate_melody(transitions, seed, length=32):
    state = seed
    melody = list(state)
    for _ in range(length):
        if state not in transitions:
            state = random.choice(list(transitions.keys()))
        next_pitch = random.choice(transitions[state])
        melody.append(next_pitch)
        state = (*state[1:], next_pitch)
    return melody

Quantization

Snapping notes to a rhythmic grid:

def quantize(notes, grid_size=0.25):
    """Snap note start times to nearest grid division (in seconds)."""
    for note in notes:
        note.start = round(note.start / grid_size) * grid_size
        note.end = max(note.start + grid_size / 2,
                       round(note.end / grid_size) * grid_size)
    return notes

MIDI to audio rendering

MIDI files need a synthesizer to produce audio. FluidSynth with a SoundFont (.sf2) is the standard approach:

import pretty_midi

pm = pretty_midi.PrettyMIDI("composition.mid")
# Requires FluidSynth and a SoundFont installed
audio = pm.fluidsynth(fs=44100)  # returns numpy array

import soundfile as sf
sf.write("rendered.wav", audio, 44100)

For higher quality, use commercial SoundFonts or route MIDI to a DAW via virtual ports.

Performance and scale

TaskApproachNotes
Parsing 10K+ MIDI filesmido with multiprocessingmido is pure Python; parallelize for throughput
Piano roll extractionpretty_midi batchPre-allocate arrays, use fixed fs
Real-time inputmido rtmidi backendLowest latency, callback-based
Large dataset MLConvert to piano rolls, save as .npyAvoid re-parsing MIDI during training

Tradeoffs

ToolStrengthsWeaknesses
midoLow-level control, real-time I/O, pure PythonManual time/tick conversion
pretty_midiHigh-level API, ML-friendly, FluidSynth integrationNo real-time I/O
music21Deep music theory, notation exportHeavy dependency, slower
miditoolkitSymbolic music research conventionsSmaller community

One thing to remember: MIDI processing in Python spans the full pipeline from binary parsing and real-time hardware I/O to piano-roll ML features and algorithmic composition — choose mido for control, pretty_midi for analysis, and combine both for production workflows.

pythonmidimusicprocessingmidopretty-midi

See Also

  • Python Arcade Library Think of a magical art table that draws your game characters, listens when you press buttons, and cleans up the mess — that's Python Arcade.
  • Python Audio Fingerprinting Ever wonder how Shazam identifies a song from just a few seconds of noisy audio? Audio fingerprinting is the magic behind it, and Python can do it too.
  • Python Barcode Generation Picture the stripy labels on grocery items to understand how Python can create those machine-readable barcodes from numbers.
  • Python Cellular Automata Imagine a checkerboard where each square follows simple rules to turn on or off — and suddenly complex patterns emerge like magic.
  • Python Godot Gdscript Bridge Imagine speaking English to a friend who speaks French, with a translator in the middle — that's how Python talks to the Godot game engine.