Convolution
A fundamental image processing operation that slides a kernel (small matrix) across an image, computing weighted sums at each position for blurring, edge detection, and sharpening.
Convolution is the core operation of spatial image filtering: a small matrix called a kernel slides across the image, computing a weighted sum of surrounding pixels at each position. Nearly all spatial filters - blur, sharpen, edge detection, emboss - are implemented as convolutions.
The convolution procedure:
- Slide the kernel (e.g. 3×3) across the image from top-left to bottom-right, one pixel at a time
- At each position, multiply each kernel element by the corresponding pixel value
- Sum all products to produce the output pixel value
- Repeat for every pixel in the image
Representative kernels and their effects:
- Box filter: All elements equal
1/9in a 3×3 kernel. Produces simple averaging blur - Gaussian filter: Weights follow a Gaussian distribution. Creates natural blur and is optimal for noise reduction
- Sobel filter: Detects horizontal and vertical edges by approximating first-order derivatives
- Laplacian filter: Detects edges in all directions via second-order derivatives. Zero-crossings indicate edge locations
- Unsharp mask: Subtracts a blurred version from the original to enhance sharpness. A staple in print and photography
Computationally, applying an N×N kernel to an M×M image requires O(M²N²) operations. For large kernels, converting to frequency domain (FFT) multiplication is faster. Separable kernels like Gaussian can be decomposed into two 1D passes, reducing complexity to O(M²N).
Deep learning CNNs (Convolutional Neural Networks) use the same convolution operation, but learn kernel values automatically through training rather than manual design.