Python Real-Time Audio Streaming — Core Concepts

What real-time audio streaming means

Real-time audio streaming processes sound continuously with minimal delay — sound enters the system, gets processed, and exits fast enough that a human perceives it as instantaneous (typically under 20–30 ms round-trip). This enables live effects, voice chat, interactive instruments, audio monitoring, and speech recognition on live microphone input.

The callback model

The standard approach uses a callback function that the audio driver invokes at regular intervals with a buffer of samples:

import sounddevice as sd
import numpy as np

def callback(indata, outdata, frames, time, status):
    # indata: input buffer (microphone), shape (frames, channels)
    # outdata: output buffer (speakers), shape (frames, channels)
    outdata[:] = indata * 0.8  # simple volume reduction

with sd.Stream(samplerate=44100, blocksize=256, channels=1, callback=callback):
    sd.sleep(10000)  # run for 10 seconds

Each callback invocation processes one block. At 44 100 Hz with a block size of 256, callbacks fire ~172 times per second, each handling ~5.8 ms of audio.

Buffer size and latency

Block size (samples)Latency at 44.1 kHzCPU overheadGlitch risk
641.5 msVery highHigh
2565.8 msModerateLow
102423 msLowVery low
409693 msVery lowNegligible

Smaller buffers mean lower latency but higher CPU overhead and greater risk of buffer underruns (the callback doesn’t finish before the next buffer is needed, causing audible glitches).

Total round-trip latency = input buffer + processing time + output buffer. With 256-sample buffers on both sides, you get ~12 ms minimum.

Keeping the callback fast

The audio callback runs in a high-priority thread. Rules for avoiding glitches:

  1. No memory allocation — pre-allocate all arrays before the stream starts
  2. No file I/O — use a queue to pass data to a separate thread for logging/saving
  3. No Python object creation — avoid list comprehensions, string formatting, dict creation
  4. Use NumPy vectorized operations — they release the GIL and run in C
  5. No blocking calls — no locks, no network, no database
# Bad: allocates new array every callback
outdata[:] = np.array([sample * gain for sample in indata[:, 0]])

# Good: in-place operation, no allocation
np.multiply(indata, gain, out=outdata)

Common real-time patterns

Live volume meter

def level_callback(indata, frames, time, status):
    rms = np.sqrt(np.mean(indata ** 2))
    db = 20 * np.log10(max(rms, 1e-10))
    print(f"\r{db:+6.1f} dB {'█' * int(max(0, db + 60))}", end='')

with sd.InputStream(callback=level_callback, blocksize=1024):
    sd.sleep(30000)

Live pitch detection

import queue

audio_q = queue.Queue()

def capture_callback(indata, frames, time, status):
    audio_q.put(indata[:, 0].copy())

# Processing thread reads from queue and runs pitch detection

Network streaming

Capture audio, compress it, and send over a network:

import socket

sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)

def stream_callback(indata, frames, time, status):
    # Send raw bytes over UDP
    sock.sendto(indata.tobytes(), ("192.168.1.100", 5000))

Ring buffers for overlapping analysis

Many DSP algorithms (FFT, filtering) need overlapping windows. A ring buffer accumulates samples across callbacks:

class RingBuffer:
    def __init__(self, size):
        self.data = np.zeros(size)
        self.write_pos = 0
    
    def write(self, samples):
        n = len(samples)
        end = self.write_pos + n
        if end <= len(self.data):
            self.data[self.write_pos:end] = samples
        else:
            split = len(self.data) - self.write_pos
            self.data[self.write_pos:] = samples[:split]
            self.data[:n - split] = samples[split:]
        self.write_pos = end % len(self.data)

Common misconception

Real-time does not mean “fast.” It means “predictable timing.” A system that processes audio in 5 ms every single time is real-time. A system that processes in 1 ms usually but occasionally takes 50 ms is not — that occasional spike causes audible glitches. Consistency matters more than raw speed.

How it fits with other tools

Use sounddevice or PyAudio for the hardware interface. Apply effects with NumPy/SciPy in the callback. For heavier processing (ML inference, FFT analysis), offload to a separate thread via a queue. Combine with WebSockets or WebRTC for browser-based audio streaming.

One thing to remember: Real-time audio streaming is a timing contract — your processing must complete within one buffer period every single time, and the callback model plus NumPy vectorization is Python’s best path to meeting that contract.

pythonaudiostreamingreal-timedspsounddevice

See Also

  • Python Arcade Library Think of a magical art table that draws your game characters, listens when you press buttons, and cleans up the mess — that's Python Arcade.
  • Python Audio Fingerprinting Ever wonder how Shazam identifies a song from just a few seconds of noisy audio? Audio fingerprinting is the magic behind it, and Python can do it too.
  • Python Barcode Generation Picture the stripy labels on grocery items to understand how Python can create those machine-readable barcodes from numbers.
  • Python Cellular Automata Imagine a checkerboard where each square follows simple rules to turn on or off — and suddenly complex patterns emerge like magic.
  • Python Godot Gdscript Bridge Imagine speaking English to a friend who speaks French, with a translator in the middle — that's how Python talks to the Godot game engine.