Convolution Operations — Core Concepts

How convolution works in 1D and 2D, why it matters for filtering and neural networks, and how to use it in Python with NumPy and SciPy.

What convolution actually computes

Convolution combines two arrays — a signal (or image) and a kernel (a small pattern) — by sliding the kernel across the signal and computing a weighted sum at each position.

For 1D discrete convolution, the output at position i is:

output[i] = sum(signal[i - k] * kernel[k] for k in range(kernel_size))

The kernel is flipped before sliding (the mathematical definition). In practice, libraries handle this automatically.

1D convolution in Python

import numpy as np
from scipy.signal import convolve

signal = np.array([1, 3, 5, 7, 9, 7, 5, 3, 1])
kernel = np.array([1, 2, 1]) / 4  # Smoothing kernel

smoothed = convolve(signal, kernel, mode='same')

The mode parameter controls output size:

Mode	Output length	Behavior
`full`	N + K - 1	All overlapping positions
`same`	N	Same size as input (centered)
`valid`	N - K + 1	Only fully overlapping positions

2D convolution for images

Images are 2D arrays. A 2D kernel (typically 3×3 or 5×5) slides across both rows and columns:

from scipy.signal import convolve2d

# Sobel edge detection kernel (horizontal edges)
sobel_y = np.array([[-1, -2, -1],
                     [ 0,  0,  0],
                     [ 1,  2,  1]])

edges = convolve2d(grayscale_image, sobel_y, mode='same', boundary='wrap')

Common kernels and their effects:

Kernel	Effect	Use case
`[[1,1,1],[1,1,1],[1,1,1]] / 9`	Box blur	Noise reduction
`[[1,2,1],[2,4,2],[1,2,1]] / 16`	Gaussian blur	Smooth noise
`[[0,-1,0],[-1,5,-1],[0,-1,0]]`	Sharpen	Enhance detail
`[[-1,-1,-1],[-1,8,-1],[-1,-1,-1]]`	Edge detect	Find boundaries

How convolution powers neural networks

Convolutional Neural Networks (CNNs) use convolution as their core operation. Instead of hand-designing kernels, the network learns the kernel values during training.

A CNN typically stacks multiple convolutional layers:

First layers learn simple features — edges, gradients, color blobs
Middle layers combine simple features into parts — eyes, wheels, letters
Deep layers combine parts into objects — faces, cars, words

Each layer applies many different kernels (called filters), producing a stack of feature maps. A layer with 64 filters takes one image and produces 64 filtered versions, each highlighting different patterns.

Padding and stride

Two parameters control how convolution is applied:

Padding adds zeros (or other values) around the input border. Without padding, the output shrinks by (kernel_size - 1) pixels. “Same” padding preserves the input size.

Stride controls how many positions the kernel moves at each step. Stride 1 (default) moves one pixel at a time. Stride 2 skips every other position, halving the output size — a common way to downsample.

Correlation vs convolution

Mathematically, convolution flips the kernel before sliding. Cross-correlation skips the flip. For symmetric kernels (like Gaussian blur), they give the same result. Most deep learning frameworks actually implement correlation and call it “convolution” — the distinction rarely matters because learned kernels can compensate for the flip.

Performance considerations

Direct convolution is O(N × K) for 1D (N signal length, K kernel length). For large kernels, FFT-based convolution is faster — O(N log N) regardless of kernel size. SciPy automatically picks the best method with scipy.signal.fftconvolve.

Common misconception

Many people think convolution and matrix multiplication are unrelated. In fact, convolution can be expressed as matrix multiplication using a Toeplitz matrix constructed from the kernel. Deep learning frameworks often use this approach (via the im2col trick) to leverage optimized matrix multiplication on GPUs.

One thing to remember: Convolution is a weighted sliding-window operation — change the kernel weights and you change what it detects, which is exactly what neural networks learn to do automatically.

pythonmathsignal-processingdeep-learning