Image Pyramid
A multi-resolution data structure built by progressively downsampling an image. Used to achieve scale invariance in object detection and template matching.
An image pyramid is a hierarchical data structure consisting of multiple copies of the same image at progressively lower resolutions. The base of the pyramid is the original full-resolution image, and each subsequent level is a reduced version of the one below it. This structure enables algorithms to operate across multiple scales efficiently.
The two most common types are the Gaussian pyramid and the Laplacian pyramid. A Gaussian pyramid is constructed by repeatedly applying a Gaussian low-pass filter followed by downsampling by a factor of 2. Level 0 is the original image, level 1 has half the width and height, and level k has resolution 1/2^k of the original. In OpenCV, cv2.pyrDown() performs one level of reduction.
- Gaussian pyramid: Each level smooths and subsamples the previous one. Memory usage for the entire pyramid is approximately 4/3 of the original image size
- Laplacian pyramid: Stores the difference between adjacent Gaussian levels, capturing edge and detail information at each scale. Widely used in image blending and seamless compositing
- Scale factor: While a factor of 2 is standard, finer intervals such as
2^(1/3)are used in feature detectors like SIFT for better scale localization
Image pyramids have broad applications. In object detection, instead of scanning a sliding window at multiple sizes, a fixed-size detector is applied to each pyramid level, reducing computational cost from quadratic to linear in scale count. In template matching, a coarse-to-fine strategy narrows candidate regions at low resolution before refining at full resolution. Modern deep learning architectures like Feature Pyramid Networks (FPN) build on this concept to produce multi-scale feature maps for accurate detection across object sizes.