Python Sounddevice Recording — Deep Dive
PortAudio under the hood
Sounddevice is a CFFI wrapper around PortAudio, which abstracts platform-specific audio APIs: WASAPI/WDM-KS on Windows, CoreAudio on macOS, ALSA/PulseAudio/JACK on Linux. Each host API has different latency characteristics, device enumeration behavior, and buffer management strategies.
import sounddevice as sd
# List host APIs
for api in sd.query_hostapis():
print(api['name'], api['default_input_device'], api['default_output_device'])
# Detailed device info
for i, dev in enumerate(sd.query_devices()):
print(f"{i}: {dev['name']} in={dev['max_input_channels']} out={dev['max_output_channels']}")
Callback threading model
When you open a stream with a callback, PortAudio spawns a real-time audio thread that invokes your callback at regular intervals. This thread runs at elevated priority and must not block — no file I/O, no network calls, no memory allocation, no Python object creation that triggers GC.
import numpy as np
import queue
audio_queue = queue.Queue()
def recording_callback(indata, frames, time, status):
if status:
print(status)
# Copy data and push to thread-safe queue — minimal work in callback
audio_queue.put(indata.copy())
with sd.InputStream(samplerate=48000, channels=1, callback=recording_callback,
blocksize=1024, dtype='float32'):
# Main thread processes audio from queue
while True:
chunk = audio_queue.get()
# Process chunk here (analysis, save to file, etc.)
The blocksize parameter controls how many frames per callback invocation. Smaller values reduce latency but increase callback frequency and CPU overhead. Setting blocksize=0 lets PortAudio choose the optimal size.
The GIL consideration
The callback runs in a C thread with the GIL released. However, NumPy operations on the indata array are fine because NumPy releases the GIL during computation. Avoid pure Python loops over audio samples in callbacks — use vectorized NumPy operations instead.
Low-latency configuration
stream = sd.Stream(
samplerate=48000,
blocksize=64, # ~1.3 ms at 48 kHz
latency='low',
channels=2,
dtype='float32',
callback=process_callback
)
Achieving sub-10ms round-trip latency requires:
- Small block size (64–256 frames)
- Exclusive-mode device access (WASAPI exclusive on Windows, JACK on Linux)
- No Python processing in the callback — use the callback only to shuttle data to/from a processing thread
- Pinned CPU cores for the audio thread (OS-level, not Python-controllable)
Measure actual latency with stream.latency which returns (input_latency, output_latency) in seconds.
Full-duplex streaming
Full-duplex (simultaneous record + playback) uses sd.Stream with both input and output channels:
def passthrough_with_gain(indata, outdata, frames, time, status):
gain = 0.5
outdata[:] = indata * gain
with sd.Stream(samplerate=44100, channels=1, callback=passthrough_with_gain):
sd.sleep(10000) # run for 10 seconds
For effects processing, maintain state between callbacks using a closure or a class:
class DelayEffect:
def __init__(self, delay_samples: int, mix: float = 0.5):
self.buffer = np.zeros(delay_samples, dtype='float32')
self.pos = 0
self.mix = mix
def __call__(self, indata, outdata, frames, time, status):
mono = indata[:, 0]
for i in range(frames):
delayed = self.buffer[self.pos]
self.buffer[self.pos] = mono[i]
self.pos = (self.pos + 1) % len(self.buffer)
outdata[i, 0] = mono[i] * (1 - self.mix) + delayed * self.mix
delay = DelayEffect(delay_samples=22050, mix=0.4) # 0.5s delay
with sd.Stream(samplerate=44100, channels=1, callback=delay):
sd.sleep(30000)
Wire protocol: read/write vs callback
The non-callback interface uses stream.read() and stream.write():
with sd.Stream(samplerate=44100, channels=1, blocksize=1024) as stream:
while True:
data, overflowed = stream.read(1024)
processed = apply_effect(data)
stream.write(processed)
This is simpler but adds the latency of the main-thread processing loop. For non-real-time tasks (recording to file, batch processing), it works well. For live monitoring, callbacks are preferred.
Error handling and robustness
def robust_callback(indata, outdata, frames, time, status):
if status.input_overflow:
print("Input overflow — audio data was lost")
if status.output_underflow:
print("Output underflow — silence was inserted")
outdata[:] = indata
Common failure modes:
| Error | Cause | Fix |
|---|---|---|
PortAudioError: Invalid sample rate | Device doesn’t support requested rate | Query sd.query_devices(device)['default_samplerate'] |
| Input overflow | Callback too slow, buffers overrun | Increase blocksize, reduce processing |
| Output underflow | Data not supplied fast enough | Pre-buffer audio, increase blocksize |
| Device not found | Unplugged USB mic, wrong index | Re-query devices, use name matching |
Multi-device and multi-channel
Select different devices for input and output:
sd.default.device = (2, 5) # device index 2 for input, 5 for output
For multi-channel recording (e.g., 8-channel audio interface):
recording = sd.rec(int(5 * 48000), samplerate=48000, channels=8, device=3)
sd.wait()
# recording.shape == (240000, 8)
Channel mapping lets you route specific hardware channels to specific array columns, useful for surround-sound or multi-mic setups.
Production patterns
Recording to WAV with soundfile
import soundfile as sf
with sf.SoundFile('output.wav', mode='w', samplerate=48000,
channels=1, subtype='PCM_24') as f:
with sd.InputStream(samplerate=48000, channels=1, callback=lambda indata, frames, time, status: f.write(indata.copy())):
sd.sleep(60000) # record 60 seconds
Level meter
def level_meter(indata, frames, time, status):
rms = np.sqrt(np.mean(indata ** 2))
db = 20 * np.log10(max(rms, 1e-10))
bars = int(max(0, (db + 60) / 60 * 50))
print(f"\r{'█' * bars}{' ' * (50 - bars)} {db:+.1f} dB", end='')
Voice Activity Detection trigger
THRESHOLD = 0.02
SILENCE_LIMIT = 1.0 # seconds
recording = False
def vad_callback(indata, frames, time, status):
global recording
energy = np.sqrt(np.mean(indata ** 2))
if energy > THRESHOLD and not recording:
recording = True
print("Speech detected — recording started")
# Extend with silence timer for auto-stop
Tradeoffs vs alternatives
| Library | Strengths | Weaknesses |
|---|---|---|
| sounddevice | NumPy-native, clean API, PortAudio backend | No file I/O built-in |
| PyAudio | Mature, widely documented | Bytes-based (not NumPy), harder install |
| python-rtaudio | Lower latency on some platforms | Smaller community |
| pyalsaaudio | Direct ALSA access on Linux | Linux-only |
Sounddevice is the best default choice for Python audio I/O when you need NumPy integration and cross-platform support.
One thing to remember: Sounddevice’s callback-based streaming model gives you direct, low-latency access to audio hardware from Python — but keep callbacks fast, non-blocking, and use queues to hand data off to your processing logic.
See Also
- Python Arcade Library Think of a magical art table that draws your game characters, listens when you press buttons, and cleans up the mess — that's Python Arcade.
- Python Audio Fingerprinting Ever wonder how Shazam identifies a song from just a few seconds of noisy audio? Audio fingerprinting is the magic behind it, and Python can do it too.
- Python Barcode Generation Picture the stripy labels on grocery items to understand how Python can create those machine-readable barcodes from numbers.
- Python Cellular Automata Imagine a checkerboard where each square follows simple rules to turn on or off — and suddenly complex patterns emerge like magic.
- Python Godot Gdscript Bridge Imagine speaking English to a friend who speaks French, with a translator in the middle — that's how Python talks to the Godot game engine.