Python Pydub Audio Processing — Core Concepts
What Pydub does
Pydub provides a high-level interface for audio file manipulation in Python. It handles loading, slicing, concatenating, volume adjustment, format conversion, and basic effects. Under the hood it relies on FFmpeg (or libav) for codec support and stores audio data as raw PCM samples in memory.
Install with pip install pydub. Ensure FFmpeg is on your system path.
AudioSegment — the core object
Everything revolves around AudioSegment, which holds the raw audio data plus metadata (sample rate, bit depth, channels):
from pydub import AudioSegment
audio = AudioSegment.from_file("podcast.mp3")
print(len(audio)) # duration in milliseconds
print(audio.frame_rate) # e.g. 44100
print(audio.channels) # 1 (mono) or 2 (stereo)
print(audio.sample_width) # bytes per sample (2 = 16-bit)
AudioSegments are immutable — every operation returns a new object, leaving the original unchanged.
Slicing and concatenation
Slicing uses millisecond indices, just like Python string slicing:
first_ten_seconds = audio[:10000]
last_five_seconds = audio[-5000:]
middle = audio[10000:20000]
Concatenate with the + operator:
combined = first_ten_seconds + last_five_seconds
Insert silence with AudioSegment.silent(duration=2000) for a two-second gap.
Volume control
Adjust volume in decibels:
louder = audio + 6 # roughly double perceived loudness
quieter = audio - 10 # significantly quieter
Fade in and out:
faded = audio.fade_in(2000).fade_out(3000)
Normalize to a target peak level:
from pydub.effects import normalize
normalized = normalize(audio)
Overlaying audio
The overlay method mixes two audio segments together, playing them simultaneously:
music = AudioSegment.from_file("background.mp3")
voice = AudioSegment.from_file("narration.wav")
# lower the music volume, then overlay
music_quiet = music - 12
mixed = music_quiet.overlay(voice, position=0)
The position parameter (in ms) controls when the overlay starts. The result length matches the longer segment by default; set loop=True to repeat the shorter one.
Format conversion and export
Export to any FFmpeg-supported format:
audio.export("output.wav", format="wav")
audio.export("output.ogg", format="ogg", codec="libvorbis")
audio.export("output.mp3", format="mp3", bitrate="192k")
Convert between sample rates and channel counts:
mono = audio.set_channels(1)
resampled = audio.set_frame_rate(22050)
Splitting and silence detection
Pydub includes utilities for splitting audio on silence:
from pydub.silence import split_on_silence
chunks = split_on_silence(audio,
min_silence_len=700, # ms of silence to trigger split
silence_thresh=-40, # dBFS threshold
keep_silence=200 # ms of silence to keep at edges
)
This is useful for chopping recordings into sentences or removing dead air from podcasts.
Raw sample access
Access raw bytes with audio.raw_data or convert to a NumPy array for DSP work:
import numpy as np
samples = np.array(audio.get_array_of_samples())
After processing, wrap samples back into an AudioSegment:
processed = AudioSegment(
samples.tobytes(),
frame_rate=audio.frame_rate,
sample_width=audio.sample_width,
channels=audio.channels
)
Practical considerations
Pydub loads the entire audio file into memory as raw PCM. A 5-minute stereo WAV at 44.1 kHz occupies roughly 50 MB. For very long files, consider processing in chunks. All operations are CPU-bound since they manipulate arrays in Python — for performance-critical DSP, extract samples to NumPy and use vectorized operations.
The one thing to remember: Pydub wraps audio manipulation in Pythonic operators — + to join, [] to slice, +/- for volume — making common audio tasks feel as natural as working with strings.
See Also
- Python Arcade Library Think of a magical art table that draws your game characters, listens when you press buttons, and cleans up the mess — that's Python Arcade.
- Python Audio Fingerprinting Ever wonder how Shazam identifies a song from just a few seconds of noisy audio? Audio fingerprinting is the magic behind it, and Python can do it too.
- Python Barcode Generation Picture the stripy labels on grocery items to understand how Python can create those machine-readable barcodes from numbers.
- Python Cellular Automata Imagine a checkerboard where each square follows simple rules to turn on or off — and suddenly complex patterns emerge like magic.
- Python Godot Gdscript Bridge Imagine speaking English to a friend who speaks French, with a translator in the middle — that's how Python talks to the Godot game engine.