Fourier Transforms — Deep Dive

Production-grade spectral analysis in Python: FFT performance tuning, STFT spectrograms, convolution theorem, and real-world signal processing pipelines.

FFT performance characteristics

The Cooley-Tukey FFT algorithm works by recursively halving the problem. It is fastest when N is a power of 2, but modern implementations (FFTPACK in NumPy, FFTW via SciPy) handle arbitrary sizes using mixed-radix decomposition. Still, performance varies:

import numpy as np
from time import perf_counter

# Power of 2: fast
signal_fast = np.random.randn(2**20)  # 1,048,576

# Prime length: slower
signal_slow = np.random.randn(1048573)  # prime number

t0 = perf_counter()
np.fft.fft(signal_fast)
print(f"Power of 2: {perf_counter() - t0:.4f}s")

t0 = perf_counter()
np.fft.fft(signal_slow)
print(f"Prime length: {perf_counter() - t0:.4f}s")

For non-power-of-2 lengths, zero-pad to the next power of 2:

from scipy.fft import next_fast_len

optimal_n = next_fast_len(len(signal))
fft_result = np.fft.fft(signal, n=optimal_n)

SciPy’s scipy.fft module is generally faster than numpy.fft because it uses a more optimized backend (pocketfft in recent versions).

The convolution theorem

Convolution in the time domain equals multiplication in the frequency domain. This is the foundation of fast filtering:

from scipy.fft import fft, ifft, next_fast_len

def fft_convolve(signal, kernel):
    n = next_fast_len(len(signal) + len(kernel) - 1)
    return np.real(ifft(fft(signal, n=n) * fft(kernel, n=n)))[:len(signal) + len(kernel) - 1]

For a signal of length N and kernel of length K, direct convolution is O(NK). FFT convolution is O((N+K) log(N+K)). When K exceeds ~50–100, FFT wins. SciPy provides this as scipy.signal.fftconvolve.

Overlap-add for streaming

When processing infinite or very long streams, you cannot FFT the entire signal at once. The overlap-add method splits the signal into blocks, convolves each block, and combines results:

from scipy.signal import oaconvolve

# Efficient for very long signals with short kernels
result = oaconvolve(long_signal, short_kernel)

Short-Time Fourier Transform (STFT)

The STFT applies windowed FFTs to overlapping segments, producing a time-frequency representation (spectrogram):

from scipy.signal import stft, istft

frequencies, times, Zxx = stft(signal, fs=sample_rate,
                                 nperseg=1024, noverlap=768)

# Zxx is complex: shape (n_frequencies, n_time_steps)
spectrogram = np.abs(Zxx) ** 2  # Power spectrogram

Key parameters:

nperseg — window size. Larger windows give better frequency resolution but worse time resolution (the uncertainty principle).
noverlap — how much adjacent windows overlap. Typical: 75% of nperseg.
window — default is Hann. Kaiser with β=8–12 is good for narrowband analysis.

The STFT is perfectly invertible via istft when using compatible parameters.

Frequency-domain filtering

Design a filter in the frequency domain and apply it:

from scipy.signal import butter, sosfilt, sosfreqz

# 4th-order Butterworth bandpass, 100-500 Hz
sos = butter(4, [100, 500], btype='band', fs=sample_rate, output='sos')
filtered = sosfilt(sos, signal)

For more control, multiply the FFT directly:

fft_signal = np.fft.rfft(signal)
freqs = np.fft.rfftfreq(len(signal), d=1/sample_rate)

# Zero out everything above 200 Hz (brick-wall low-pass)
fft_signal[freqs > 200] = 0
filtered = np.fft.irfft(fft_signal, n=len(signal))

Brick-wall filters cause ringing (Gibbs phenomenon). In practice, use smooth roll-off filters (Butterworth, Chebyshev, elliptic).

Multi-dimensional FFTs

2D: Image processing

from scipy.fft import fft2, ifft2, fftshift

# Remove periodic noise from an image
F = fft2(image)
F_shifted = fftshift(F)

# Create a notch filter at the noise frequency
rows, cols = image.shape
crow, ccol = rows // 2, cols // 2
# Zero out specific frequency bins
F_shifted[crow-5:crow+5, ccol+50:ccol+55] = 0
F_shifted[crow-5:crow+5, ccol-55:ccol-50] = 0

cleaned = np.real(ifft2(np.fft.ifftshift(F_shifted)))

3D: Volumetric data

Medical imaging (CT, MRI) and fluid dynamics use 3D FFTs. NumPy’s fftn handles arbitrary dimensions:

volume_fft = np.fft.fftn(volume_data)

Welch’s method for power spectral density

For noisy signals, a single FFT produces a noisy spectrum. Welch’s method averages multiple overlapping, windowed FFTs to produce a smoother estimate:

from scipy.signal import welch

freqs, psd = welch(signal, fs=sample_rate, nperseg=4096)
# psd is in V²/Hz — smooth estimate of power spectral density

This is the standard approach for vibration analysis, EEG processing, and any application where spectral shape matters more than individual frequency bin values.

Zero-padding and interpolation

Zero-padding before the FFT does not increase frequency resolution — it only interpolates between existing frequency bins, producing a smoother-looking spectrum. True resolution depends on the signal length (observation time).

# Original: 1024 samples → 1024 frequency bins
fft_original = np.fft.fft(signal[:1024])

# Zero-padded: same signal, but 4096 bins
fft_padded = np.fft.fft(signal[:1024], n=4096)
# Smoother plot, same actual resolution

GPU-accelerated FFT

For large transforms (N > 10⁶) or batched transforms, GPU acceleration via CuPy provides 10–50× speedup:

import cupy as cp

signal_gpu = cp.asarray(signal)
fft_gpu = cp.fft.fft(signal_gpu)
result = cp.asnumpy(fft_gpu)

For deep learning pipelines, PyTorch’s torch.fft integrates FFT operations into the autograd graph, enabling gradient-based optimization in the frequency domain.

Real-world pipeline: audio feature extraction

import numpy as np
from scipy.signal import stft
from scipy.fft import dct

def mel_spectrogram(audio, sr=16000, n_fft=1024, n_mels=80):
    """Compute mel spectrogram for speech processing."""
    # STFT
    _, _, Zxx = stft(audio, fs=sr, nperseg=n_fft, noverlap=n_fft // 2)
    power = np.abs(Zxx) ** 2
    
    # Mel filter bank
    mel_filters = _mel_filter_bank(sr, n_fft, n_mels)
    mel_spec = mel_filters @ power
    
    # Log compression
    return np.log(mel_spec + 1e-9)

def _mel_filter_bank(sr, n_fft, n_mels):
    """Triangular mel-scale filter bank."""
    mel_low, mel_high = 0, 2595 * np.log10(1 + sr / 2 / 700)
    mel_points = np.linspace(mel_low, mel_high, n_mels + 2)
    hz_points = 700 * (10 ** (mel_points / 2595) - 1)
    bin_points = np.floor((n_fft + 1) * hz_points / sr).astype(int)
    
    filters = np.zeros((n_mels, n_fft // 2 + 1))
    for i in range(n_mels):
        for j in range(bin_points[i], bin_points[i+1]):
            filters[i, j] = (j - bin_points[i]) / (bin_points[i+1] - bin_points[i])
        for j in range(bin_points[i+1], bin_points[i+2]):
            filters[i, j] = (bin_points[i+2] - j) / (bin_points[i+2] - bin_points[i+1])
    return filters

This pipeline powers speech recognition systems (Whisper, wav2vec) and music information retrieval.

One thing to remember: The FFT’s real power comes from the convolution theorem — any operation that is expensive in the time domain (filtering, correlation, polynomial multiplication) becomes cheap element-wise multiplication in the frequency domain.

pythonmathsignal-processingnumpyscipy