Python Real-Time Audio Streaming — Core Concepts
What real-time audio streaming means
Real-time audio streaming processes sound continuously with minimal delay — sound enters the system, gets processed, and exits fast enough that a human perceives it as instantaneous (typically under 20–30 ms round-trip). This enables live effects, voice chat, interactive instruments, audio monitoring, and speech recognition on live microphone input.
The callback model
The standard approach uses a callback function that the audio driver invokes at regular intervals with a buffer of samples:
import sounddevice as sd
import numpy as np
def callback(indata, outdata, frames, time, status):
# indata: input buffer (microphone), shape (frames, channels)
# outdata: output buffer (speakers), shape (frames, channels)
outdata[:] = indata * 0.8 # simple volume reduction
with sd.Stream(samplerate=44100, blocksize=256, channels=1, callback=callback):
sd.sleep(10000) # run for 10 seconds
Each callback invocation processes one block. At 44 100 Hz with a block size of 256, callbacks fire ~172 times per second, each handling ~5.8 ms of audio.
Buffer size and latency
| Block size (samples) | Latency at 44.1 kHz | CPU overhead | Glitch risk |
|---|---|---|---|
| 64 | 1.5 ms | Very high | High |
| 256 | 5.8 ms | Moderate | Low |
| 1024 | 23 ms | Low | Very low |
| 4096 | 93 ms | Very low | Negligible |
Smaller buffers mean lower latency but higher CPU overhead and greater risk of buffer underruns (the callback doesn’t finish before the next buffer is needed, causing audible glitches).
Total round-trip latency = input buffer + processing time + output buffer. With 256-sample buffers on both sides, you get ~12 ms minimum.
Keeping the callback fast
The audio callback runs in a high-priority thread. Rules for avoiding glitches:
- No memory allocation — pre-allocate all arrays before the stream starts
- No file I/O — use a queue to pass data to a separate thread for logging/saving
- No Python object creation — avoid list comprehensions, string formatting, dict creation
- Use NumPy vectorized operations — they release the GIL and run in C
- No blocking calls — no locks, no network, no database
# Bad: allocates new array every callback
outdata[:] = np.array([sample * gain for sample in indata[:, 0]])
# Good: in-place operation, no allocation
np.multiply(indata, gain, out=outdata)
Common real-time patterns
Live volume meter
def level_callback(indata, frames, time, status):
rms = np.sqrt(np.mean(indata ** 2))
db = 20 * np.log10(max(rms, 1e-10))
print(f"\r{db:+6.1f} dB {'█' * int(max(0, db + 60))}", end='')
with sd.InputStream(callback=level_callback, blocksize=1024):
sd.sleep(30000)
Live pitch detection
import queue
audio_q = queue.Queue()
def capture_callback(indata, frames, time, status):
audio_q.put(indata[:, 0].copy())
# Processing thread reads from queue and runs pitch detection
Network streaming
Capture audio, compress it, and send over a network:
import socket
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
def stream_callback(indata, frames, time, status):
# Send raw bytes over UDP
sock.sendto(indata.tobytes(), ("192.168.1.100", 5000))
Ring buffers for overlapping analysis
Many DSP algorithms (FFT, filtering) need overlapping windows. A ring buffer accumulates samples across callbacks:
class RingBuffer:
def __init__(self, size):
self.data = np.zeros(size)
self.write_pos = 0
def write(self, samples):
n = len(samples)
end = self.write_pos + n
if end <= len(self.data):
self.data[self.write_pos:end] = samples
else:
split = len(self.data) - self.write_pos
self.data[self.write_pos:] = samples[:split]
self.data[:n - split] = samples[split:]
self.write_pos = end % len(self.data)
Common misconception
Real-time does not mean “fast.” It means “predictable timing.” A system that processes audio in 5 ms every single time is real-time. A system that processes in 1 ms usually but occasionally takes 50 ms is not — that occasional spike causes audible glitches. Consistency matters more than raw speed.
How it fits with other tools
Use sounddevice or PyAudio for the hardware interface. Apply effects with NumPy/SciPy in the callback. For heavier processing (ML inference, FFT analysis), offload to a separate thread via a queue. Combine with WebSockets or WebRTC for browser-based audio streaming.
One thing to remember: Real-time audio streaming is a timing contract — your processing must complete within one buffer period every single time, and the callback model plus NumPy vectorization is Python’s best path to meeting that contract.
See Also
- Python Arcade Library Think of a magical art table that draws your game characters, listens when you press buttons, and cleans up the mess — that's Python Arcade.
- Python Audio Fingerprinting Ever wonder how Shazam identifies a song from just a few seconds of noisy audio? Audio fingerprinting is the magic behind it, and Python can do it too.
- Python Barcode Generation Picture the stripy labels on grocery items to understand how Python can create those machine-readable barcodes from numbers.
- Python Cellular Automata Imagine a checkerboard where each square follows simple rules to turn on or off — and suddenly complex patterns emerge like magic.
- Python Godot Gdscript Bridge Imagine speaking English to a friend who speaks French, with a translator in the middle — that's how Python talks to the Godot game engine.