Python Sounddevice Recording — Deep Dive

PortAudio under the hood

Sounddevice is a CFFI wrapper around PortAudio, which abstracts platform-specific audio APIs: WASAPI/WDM-KS on Windows, CoreAudio on macOS, ALSA/PulseAudio/JACK on Linux. Each host API has different latency characteristics, device enumeration behavior, and buffer management strategies.

import sounddevice as sd

# List host APIs
for api in sd.query_hostapis():
    print(api['name'], api['default_input_device'], api['default_output_device'])

# Detailed device info
for i, dev in enumerate(sd.query_devices()):
    print(f"{i}: {dev['name']} in={dev['max_input_channels']} out={dev['max_output_channels']}")

Callback threading model

When you open a stream with a callback, PortAudio spawns a real-time audio thread that invokes your callback at regular intervals. This thread runs at elevated priority and must not block — no file I/O, no network calls, no memory allocation, no Python object creation that triggers GC.

import numpy as np
import queue

audio_queue = queue.Queue()

def recording_callback(indata, frames, time, status):
    if status:
        print(status)
    # Copy data and push to thread-safe queue — minimal work in callback
    audio_queue.put(indata.copy())

with sd.InputStream(samplerate=48000, channels=1, callback=recording_callback,
                     blocksize=1024, dtype='float32'):
    # Main thread processes audio from queue
    while True:
        chunk = audio_queue.get()
        # Process chunk here (analysis, save to file, etc.)

The blocksize parameter controls how many frames per callback invocation. Smaller values reduce latency but increase callback frequency and CPU overhead. Setting blocksize=0 lets PortAudio choose the optimal size.

The GIL consideration

The callback runs in a C thread with the GIL released. However, NumPy operations on the indata array are fine because NumPy releases the GIL during computation. Avoid pure Python loops over audio samples in callbacks — use vectorized NumPy operations instead.

Low-latency configuration

stream = sd.Stream(
    samplerate=48000,
    blocksize=64,          # ~1.3 ms at 48 kHz
    latency='low',
    channels=2,
    dtype='float32',
    callback=process_callback
)

Achieving sub-10ms round-trip latency requires:

  1. Small block size (64–256 frames)
  2. Exclusive-mode device access (WASAPI exclusive on Windows, JACK on Linux)
  3. No Python processing in the callback — use the callback only to shuttle data to/from a processing thread
  4. Pinned CPU cores for the audio thread (OS-level, not Python-controllable)

Measure actual latency with stream.latency which returns (input_latency, output_latency) in seconds.

Full-duplex streaming

Full-duplex (simultaneous record + playback) uses sd.Stream with both input and output channels:

def passthrough_with_gain(indata, outdata, frames, time, status):
    gain = 0.5
    outdata[:] = indata * gain

with sd.Stream(samplerate=44100, channels=1, callback=passthrough_with_gain):
    sd.sleep(10000)  # run for 10 seconds

For effects processing, maintain state between callbacks using a closure or a class:

class DelayEffect:
    def __init__(self, delay_samples: int, mix: float = 0.5):
        self.buffer = np.zeros(delay_samples, dtype='float32')
        self.pos = 0
        self.mix = mix
    
    def __call__(self, indata, outdata, frames, time, status):
        mono = indata[:, 0]
        for i in range(frames):
            delayed = self.buffer[self.pos]
            self.buffer[self.pos] = mono[i]
            self.pos = (self.pos + 1) % len(self.buffer)
            outdata[i, 0] = mono[i] * (1 - self.mix) + delayed * self.mix

delay = DelayEffect(delay_samples=22050, mix=0.4)  # 0.5s delay
with sd.Stream(samplerate=44100, channels=1, callback=delay):
    sd.sleep(30000)

Wire protocol: read/write vs callback

The non-callback interface uses stream.read() and stream.write():

with sd.Stream(samplerate=44100, channels=1, blocksize=1024) as stream:
    while True:
        data, overflowed = stream.read(1024)
        processed = apply_effect(data)
        stream.write(processed)

This is simpler but adds the latency of the main-thread processing loop. For non-real-time tasks (recording to file, batch processing), it works well. For live monitoring, callbacks are preferred.

Error handling and robustness

def robust_callback(indata, outdata, frames, time, status):
    if status.input_overflow:
        print("Input overflow — audio data was lost")
    if status.output_underflow:
        print("Output underflow — silence was inserted")
    outdata[:] = indata

Common failure modes:

ErrorCauseFix
PortAudioError: Invalid sample rateDevice doesn’t support requested rateQuery sd.query_devices(device)['default_samplerate']
Input overflowCallback too slow, buffers overrunIncrease blocksize, reduce processing
Output underflowData not supplied fast enoughPre-buffer audio, increase blocksize
Device not foundUnplugged USB mic, wrong indexRe-query devices, use name matching

Multi-device and multi-channel

Select different devices for input and output:

sd.default.device = (2, 5)  # device index 2 for input, 5 for output

For multi-channel recording (e.g., 8-channel audio interface):

recording = sd.rec(int(5 * 48000), samplerate=48000, channels=8, device=3)
sd.wait()
# recording.shape == (240000, 8)

Channel mapping lets you route specific hardware channels to specific array columns, useful for surround-sound or multi-mic setups.

Production patterns

Recording to WAV with soundfile

import soundfile as sf

with sf.SoundFile('output.wav', mode='w', samplerate=48000,
                   channels=1, subtype='PCM_24') as f:
    with sd.InputStream(samplerate=48000, channels=1, callback=lambda indata, frames, time, status: f.write(indata.copy())):
        sd.sleep(60000)  # record 60 seconds

Level meter

def level_meter(indata, frames, time, status):
    rms = np.sqrt(np.mean(indata ** 2))
    db = 20 * np.log10(max(rms, 1e-10))
    bars = int(max(0, (db + 60) / 60 * 50))
    print(f"\r{'█' * bars}{' ' * (50 - bars)} {db:+.1f} dB", end='')

Voice Activity Detection trigger

THRESHOLD = 0.02
SILENCE_LIMIT = 1.0  # seconds
recording = False

def vad_callback(indata, frames, time, status):
    global recording
    energy = np.sqrt(np.mean(indata ** 2))
    if energy > THRESHOLD and not recording:
        recording = True
        print("Speech detected — recording started")
    # Extend with silence timer for auto-stop

Tradeoffs vs alternatives

LibraryStrengthsWeaknesses
sounddeviceNumPy-native, clean API, PortAudio backendNo file I/O built-in
PyAudioMature, widely documentedBytes-based (not NumPy), harder install
python-rtaudioLower latency on some platformsSmaller community
pyalsaaudioDirect ALSA access on LinuxLinux-only

Sounddevice is the best default choice for Python audio I/O when you need NumPy integration and cross-platform support.

One thing to remember: Sounddevice’s callback-based streaming model gives you direct, low-latency access to audio hardware from Python — but keep callbacks fast, non-blocking, and use queues to hand data off to your processing logic.

pythonsounddeviceaudiorecordingportaudioreal-time

See Also

  • Python Arcade Library Think of a magical art table that draws your game characters, listens when you press buttons, and cleans up the mess — that's Python Arcade.
  • Python Audio Fingerprinting Ever wonder how Shazam identifies a song from just a few seconds of noisy audio? Audio fingerprinting is the magic behind it, and Python can do it too.
  • Python Barcode Generation Picture the stripy labels on grocery items to understand how Python can create those machine-readable barcodes from numbers.
  • Python Cellular Automata Imagine a checkerboard where each square follows simple rules to turn on or off — and suddenly complex patterns emerge like magic.
  • Python Godot Gdscript Bridge Imagine speaking English to a friend who speaks French, with a translator in the middle — that's how Python talks to the Godot game engine.