Python Sounddevice Recording — Deep Dive

Master sounddevice's PortAudio internals, callback threading model, low-latency configuration, full-duplex streaming, and production audio pipeline patterns.

PortAudio under the hood

Sounddevice is a CFFI wrapper around PortAudio, which abstracts platform-specific audio APIs: WASAPI/WDM-KS on Windows, CoreAudio on macOS, ALSA/PulseAudio/JACK on Linux. Each host API has different latency characteristics, device enumeration behavior, and buffer management strategies.

import sounddevice as sd

# List host APIs
for api in sd.query_hostapis():
    print(api['name'], api['default_input_device'], api['default_output_device'])

# Detailed device info
for i, dev in enumerate(sd.query_devices()):
    print(f"{i}: {dev['name']} in={dev['max_input_channels']} out={dev['max_output_channels']}")

Callback threading model

When you open a stream with a callback, PortAudio spawns a real-time audio thread that invokes your callback at regular intervals. This thread runs at elevated priority and must not block — no file I/O, no network calls, no memory allocation, no Python object creation that triggers GC.

import numpy as np
import queue

audio_queue = queue.Queue()

def recording_callback(indata, frames, time, status):
    if status:
        print(status)
    # Copy data and push to thread-safe queue — minimal work in callback
    audio_queue.put(indata.copy())

with sd.InputStream(samplerate=48000, channels=1, callback=recording_callback,
                     blocksize=1024, dtype='float32'):
    # Main thread processes audio from queue
    while True:
        chunk = audio_queue.get()
        # Process chunk here (analysis, save to file, etc.)

The blocksize parameter controls how many frames per callback invocation. Smaller values reduce latency but increase callback frequency and CPU overhead. Setting blocksize=0 lets PortAudio choose the optimal size.

The GIL consideration

The callback runs in a C thread with the GIL released. However, NumPy operations on the indata array are fine because NumPy releases the GIL during computation. Avoid pure Python loops over audio samples in callbacks — use vectorized NumPy operations instead.

Low-latency configuration

stream = sd.Stream(
    samplerate=48000,
    blocksize=64,          # ~1.3 ms at 48 kHz
    latency='low',
    channels=2,
    dtype='float32',
    callback=process_callback
)

Achieving sub-10ms round-trip latency requires:

Small block size (64–256 frames)
Exclusive-mode device access (WASAPI exclusive on Windows, JACK on Linux)
No Python processing in the callback — use the callback only to shuttle data to/from a processing thread
Pinned CPU cores for the audio thread (OS-level, not Python-controllable)

Measure actual latency with stream.latency which returns (input_latency, output_latency) in seconds.

Full-duplex streaming

Full-duplex (simultaneous record + playback) uses sd.Stream with both input and output channels:

def passthrough_with_gain(indata, outdata, frames, time, status):
    gain = 0.5
    outdata[:] = indata * gain

with sd.Stream(samplerate=44100, channels=1, callback=passthrough_with_gain):
    sd.sleep(10000)  # run for 10 seconds

For effects processing, maintain state between callbacks using a closure or a class:

class DelayEffect:
    def __init__(self, delay_samples: int, mix: float = 0.5):
        self.buffer = np.zeros(delay_samples, dtype='float32')
        self.pos = 0
        self.mix = mix
    
    def __call__(self, indata, outdata, frames, time, status):
        mono = indata[:, 0]
        for i in range(frames):
            delayed = self.buffer[self.pos]
            self.buffer[self.pos] = mono[i]
            self.pos = (self.pos + 1) % len(self.buffer)
            outdata[i, 0] = mono[i] * (1 - self.mix) + delayed * self.mix

delay = DelayEffect(delay_samples=22050, mix=0.4)  # 0.5s delay
with sd.Stream(samplerate=44100, channels=1, callback=delay):
    sd.sleep(30000)

Wire protocol: read/write vs callback

The non-callback interface uses stream.read() and stream.write():

with sd.Stream(samplerate=44100, channels=1, blocksize=1024) as stream:
    while True:
        data, overflowed = stream.read(1024)
        processed = apply_effect(data)
        stream.write(processed)

This is simpler but adds the latency of the main-thread processing loop. For non-real-time tasks (recording to file, batch processing), it works well. For live monitoring, callbacks are preferred.

Error handling and robustness

def robust_callback(indata, outdata, frames, time, status):
    if status.input_overflow:
        print("Input overflow — audio data was lost")
    if status.output_underflow:
        print("Output underflow — silence was inserted")
    outdata[:] = indata

Common failure modes:

Error	Cause	Fix
`PortAudioError: Invalid sample rate`	Device doesn’t support requested rate	Query `sd.query_devices(device)['default_samplerate']`
Input overflow	Callback too slow, buffers overrun	Increase blocksize, reduce processing
Output underflow	Data not supplied fast enough	Pre-buffer audio, increase blocksize
Device not found	Unplugged USB mic, wrong index	Re-query devices, use name matching

Multi-device and multi-channel

Select different devices for input and output:

sd.default.device = (2, 5)  # device index 2 for input, 5 for output

For multi-channel recording (e.g., 8-channel audio interface):

recording = sd.rec(int(5 * 48000), samplerate=48000, channels=8, device=3)
sd.wait()
# recording.shape == (240000, 8)

Channel mapping lets you route specific hardware channels to specific array columns, useful for surround-sound or multi-mic setups.

Production patterns

Recording to WAV with soundfile

import soundfile as sf

with sf.SoundFile('output.wav', mode='w', samplerate=48000,
                   channels=1, subtype='PCM_24') as f:
    with sd.InputStream(samplerate=48000, channels=1, callback=lambda indata, frames, time, status: f.write(indata.copy())):
        sd.sleep(60000)  # record 60 seconds

Level meter

def level_meter(indata, frames, time, status):
    rms = np.sqrt(np.mean(indata ** 2))
    db = 20 * np.log10(max(rms, 1e-10))
    bars = int(max(0, (db + 60) / 60 * 50))
    print(f"\r{'█' * bars}{' ' * (50 - bars)} {db:+.1f} dB", end='')

Voice Activity Detection trigger

THRESHOLD = 0.02
SILENCE_LIMIT = 1.0  # seconds
recording = False

def vad_callback(indata, frames, time, status):
    global recording
    energy = np.sqrt(np.mean(indata ** 2))
    if energy > THRESHOLD and not recording:
        recording = True
        print("Speech detected — recording started")
    # Extend with silence timer for auto-stop

Tradeoffs vs alternatives

Library	Strengths	Weaknesses
sounddevice	NumPy-native, clean API, PortAudio backend	No file I/O built-in
PyAudio	Mature, widely documented	Bytes-based (not NumPy), harder install
python-rtaudio	Lower latency on some platforms	Smaller community
pyalsaaudio	Direct ALSA access on Linux	Linux-only

Sounddevice is the best default choice for Python audio I/O when you need NumPy integration and cross-platform support.

One thing to remember: Sounddevice’s callback-based streaming model gives you direct, low-latency access to audio hardware from Python — but keep callbacks fast, non-blocking, and use queues to hand data off to your processing logic.

pythonsounddeviceaudiorecordingportaudioreal-time