Fourier Transforms — Deep Dive
FFT performance characteristics
The Cooley-Tukey FFT algorithm works by recursively halving the problem. It is fastest when N is a power of 2, but modern implementations (FFTPACK in NumPy, FFTW via SciPy) handle arbitrary sizes using mixed-radix decomposition. Still, performance varies:
import numpy as np
from time import perf_counter
# Power of 2: fast
signal_fast = np.random.randn(2**20) # 1,048,576
# Prime length: slower
signal_slow = np.random.randn(1048573) # prime number
t0 = perf_counter()
np.fft.fft(signal_fast)
print(f"Power of 2: {perf_counter() - t0:.4f}s")
t0 = perf_counter()
np.fft.fft(signal_slow)
print(f"Prime length: {perf_counter() - t0:.4f}s")
For non-power-of-2 lengths, zero-pad to the next power of 2:
from scipy.fft import next_fast_len
optimal_n = next_fast_len(len(signal))
fft_result = np.fft.fft(signal, n=optimal_n)
SciPy’s scipy.fft module is generally faster than numpy.fft because it uses a more optimized backend (pocketfft in recent versions).
The convolution theorem
Convolution in the time domain equals multiplication in the frequency domain. This is the foundation of fast filtering:
from scipy.fft import fft, ifft, next_fast_len
def fft_convolve(signal, kernel):
n = next_fast_len(len(signal) + len(kernel) - 1)
return np.real(ifft(fft(signal, n=n) * fft(kernel, n=n)))[:len(signal) + len(kernel) - 1]
For a signal of length N and kernel of length K, direct convolution is O(NK). FFT convolution is O((N+K) log(N+K)). When K exceeds ~50–100, FFT wins. SciPy provides this as scipy.signal.fftconvolve.
Overlap-add for streaming
When processing infinite or very long streams, you cannot FFT the entire signal at once. The overlap-add method splits the signal into blocks, convolves each block, and combines results:
from scipy.signal import oaconvolve
# Efficient for very long signals with short kernels
result = oaconvolve(long_signal, short_kernel)
Short-Time Fourier Transform (STFT)
The STFT applies windowed FFTs to overlapping segments, producing a time-frequency representation (spectrogram):
from scipy.signal import stft, istft
frequencies, times, Zxx = stft(signal, fs=sample_rate,
nperseg=1024, noverlap=768)
# Zxx is complex: shape (n_frequencies, n_time_steps)
spectrogram = np.abs(Zxx) ** 2 # Power spectrogram
Key parameters:
- nperseg — window size. Larger windows give better frequency resolution but worse time resolution (the uncertainty principle).
- noverlap — how much adjacent windows overlap. Typical: 75% of nperseg.
- window — default is Hann. Kaiser with β=8–12 is good for narrowband analysis.
The STFT is perfectly invertible via istft when using compatible parameters.
Frequency-domain filtering
Design a filter in the frequency domain and apply it:
from scipy.signal import butter, sosfilt, sosfreqz
# 4th-order Butterworth bandpass, 100-500 Hz
sos = butter(4, [100, 500], btype='band', fs=sample_rate, output='sos')
filtered = sosfilt(sos, signal)
For more control, multiply the FFT directly:
fft_signal = np.fft.rfft(signal)
freqs = np.fft.rfftfreq(len(signal), d=1/sample_rate)
# Zero out everything above 200 Hz (brick-wall low-pass)
fft_signal[freqs > 200] = 0
filtered = np.fft.irfft(fft_signal, n=len(signal))
Brick-wall filters cause ringing (Gibbs phenomenon). In practice, use smooth roll-off filters (Butterworth, Chebyshev, elliptic).
Multi-dimensional FFTs
2D: Image processing
from scipy.fft import fft2, ifft2, fftshift
# Remove periodic noise from an image
F = fft2(image)
F_shifted = fftshift(F)
# Create a notch filter at the noise frequency
rows, cols = image.shape
crow, ccol = rows // 2, cols // 2
# Zero out specific frequency bins
F_shifted[crow-5:crow+5, ccol+50:ccol+55] = 0
F_shifted[crow-5:crow+5, ccol-55:ccol-50] = 0
cleaned = np.real(ifft2(np.fft.ifftshift(F_shifted)))
3D: Volumetric data
Medical imaging (CT, MRI) and fluid dynamics use 3D FFTs. NumPy’s fftn handles arbitrary dimensions:
volume_fft = np.fft.fftn(volume_data)
Welch’s method for power spectral density
For noisy signals, a single FFT produces a noisy spectrum. Welch’s method averages multiple overlapping, windowed FFTs to produce a smoother estimate:
from scipy.signal import welch
freqs, psd = welch(signal, fs=sample_rate, nperseg=4096)
# psd is in V²/Hz — smooth estimate of power spectral density
This is the standard approach for vibration analysis, EEG processing, and any application where spectral shape matters more than individual frequency bin values.
Zero-padding and interpolation
Zero-padding before the FFT does not increase frequency resolution — it only interpolates between existing frequency bins, producing a smoother-looking spectrum. True resolution depends on the signal length (observation time).
# Original: 1024 samples → 1024 frequency bins
fft_original = np.fft.fft(signal[:1024])
# Zero-padded: same signal, but 4096 bins
fft_padded = np.fft.fft(signal[:1024], n=4096)
# Smoother plot, same actual resolution
GPU-accelerated FFT
For large transforms (N > 10⁶) or batched transforms, GPU acceleration via CuPy provides 10–50× speedup:
import cupy as cp
signal_gpu = cp.asarray(signal)
fft_gpu = cp.fft.fft(signal_gpu)
result = cp.asnumpy(fft_gpu)
For deep learning pipelines, PyTorch’s torch.fft integrates FFT operations into the autograd graph, enabling gradient-based optimization in the frequency domain.
Real-world pipeline: audio feature extraction
import numpy as np
from scipy.signal import stft
from scipy.fft import dct
def mel_spectrogram(audio, sr=16000, n_fft=1024, n_mels=80):
"""Compute mel spectrogram for speech processing."""
# STFT
_, _, Zxx = stft(audio, fs=sr, nperseg=n_fft, noverlap=n_fft // 2)
power = np.abs(Zxx) ** 2
# Mel filter bank
mel_filters = _mel_filter_bank(sr, n_fft, n_mels)
mel_spec = mel_filters @ power
# Log compression
return np.log(mel_spec + 1e-9)
def _mel_filter_bank(sr, n_fft, n_mels):
"""Triangular mel-scale filter bank."""
mel_low, mel_high = 0, 2595 * np.log10(1 + sr / 2 / 700)
mel_points = np.linspace(mel_low, mel_high, n_mels + 2)
hz_points = 700 * (10 ** (mel_points / 2595) - 1)
bin_points = np.floor((n_fft + 1) * hz_points / sr).astype(int)
filters = np.zeros((n_mels, n_fft // 2 + 1))
for i in range(n_mels):
for j in range(bin_points[i], bin_points[i+1]):
filters[i, j] = (j - bin_points[i]) / (bin_points[i+1] - bin_points[i])
for j in range(bin_points[i+1], bin_points[i+2]):
filters[i, j] = (bin_points[i+2] - j) / (bin_points[i+2] - bin_points[i+1])
return filters
This pipeline powers speech recognition systems (Whisper, wav2vec) and music information retrieval.
One thing to remember: The FFT’s real power comes from the convolution theorem — any operation that is expensive in the time domain (filtering, correlation, polynomial multiplication) becomes cheap element-wise multiplication in the frequency domain.
See Also
- Python Bayesian Inference How updating your beliefs with new evidence works — and why it helps computers make smarter guesses.
- Python Convolution Operations The sliding-window trick that lets computers sharpen photos, recognize faces, and hear words in noisy audio.
- Python Genetic Algorithms How computers borrow evolution's playbook — survival of the fittest, mutation, and reproduction — to solve problems too complicated for brute force.
- Python Linear Algebra Numpy Why solving puzzles with rows and columns of numbers is the secret engine behind search engines, video games, and AI.
- Python Markov Chains Why the next thing that happens often depends only on what is happening right now — and how that one rule generates text, predicts weather, and powers board games.