Python Real-Time Audio Streaming — Core Concepts

Learn callback-based audio streaming, buffer management, latency tradeoffs, and how to build real-time effects and analysis pipelines in Python.

What real-time audio streaming means

Real-time audio streaming processes sound continuously with minimal delay — sound enters the system, gets processed, and exits fast enough that a human perceives it as instantaneous (typically under 20–30 ms round-trip). This enables live effects, voice chat, interactive instruments, audio monitoring, and speech recognition on live microphone input.

The callback model

The standard approach uses a callback function that the audio driver invokes at regular intervals with a buffer of samples:

import sounddevice as sd
import numpy as np

def callback(indata, outdata, frames, time, status):
    # indata: input buffer (microphone), shape (frames, channels)
    # outdata: output buffer (speakers), shape (frames, channels)
    outdata[:] = indata * 0.8  # simple volume reduction

with sd.Stream(samplerate=44100, blocksize=256, channels=1, callback=callback):
    sd.sleep(10000)  # run for 10 seconds

Each callback invocation processes one block. At 44 100 Hz with a block size of 256, callbacks fire ~172 times per second, each handling ~5.8 ms of audio.

Buffer size and latency

Block size (samples)	Latency at 44.1 kHz	CPU overhead	Glitch risk
64	1.5 ms	Very high	High
256	5.8 ms	Moderate	Low
1024	23 ms	Low	Very low
4096	93 ms	Very low	Negligible

Smaller buffers mean lower latency but higher CPU overhead and greater risk of buffer underruns (the callback doesn’t finish before the next buffer is needed, causing audible glitches).

Total round-trip latency = input buffer + processing time + output buffer. With 256-sample buffers on both sides, you get ~12 ms minimum.

Keeping the callback fast

The audio callback runs in a high-priority thread. Rules for avoiding glitches:

No memory allocation — pre-allocate all arrays before the stream starts
No file I/O — use a queue to pass data to a separate thread for logging/saving
No Python object creation — avoid list comprehensions, string formatting, dict creation
Use NumPy vectorized operations — they release the GIL and run in C
No blocking calls — no locks, no network, no database

# Bad: allocates new array every callback
outdata[:] = np.array([sample * gain for sample in indata[:, 0]])

# Good: in-place operation, no allocation
np.multiply(indata, gain, out=outdata)

Common real-time patterns

Live volume meter

def level_callback(indata, frames, time, status):
    rms = np.sqrt(np.mean(indata ** 2))
    db = 20 * np.log10(max(rms, 1e-10))
    print(f"\r{db:+6.1f} dB {'█' * int(max(0, db + 60))}", end='')

with sd.InputStream(callback=level_callback, blocksize=1024):
    sd.sleep(30000)

Live pitch detection

import queue

audio_q = queue.Queue()

def capture_callback(indata, frames, time, status):
    audio_q.put(indata[:, 0].copy())

# Processing thread reads from queue and runs pitch detection

Network streaming

Capture audio, compress it, and send over a network:

import socket

sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)

def stream_callback(indata, frames, time, status):
    # Send raw bytes over UDP
    sock.sendto(indata.tobytes(), ("192.168.1.100", 5000))

Ring buffers for overlapping analysis

Many DSP algorithms (FFT, filtering) need overlapping windows. A ring buffer accumulates samples across callbacks:

class RingBuffer:
    def __init__(self, size):
        self.data = np.zeros(size)
        self.write_pos = 0
    
    def write(self, samples):
        n = len(samples)
        end = self.write_pos + n
        if end <= len(self.data):
            self.data[self.write_pos:end] = samples
        else:
            split = len(self.data) - self.write_pos
            self.data[self.write_pos:] = samples[:split]
            self.data[:n - split] = samples[split:]
        self.write_pos = end % len(self.data)

Common misconception

Real-time does not mean “fast.” It means “predictable timing.” A system that processes audio in 5 ms every single time is real-time. A system that processes in 1 ms usually but occasionally takes 50 ms is not — that occasional spike causes audible glitches. Consistency matters more than raw speed.

How it fits with other tools

Use sounddevice or PyAudio for the hardware interface. Apply effects with NumPy/SciPy in the callback. For heavier processing (ML inference, FFT analysis), offload to a separate thread via a queue. Combine with WebSockets or WebRTC for browser-based audio streaming.

One thing to remember: Real-time audio streaming is a timing contract — your processing must complete within one buffer period every single time, and the callback model plus NumPy vectorization is Python’s best path to meeting that contract.

pythonaudiostreamingreal-timedspsounddevice