Scale Space
A continuous framework for representing images at varying levels of detail. By varying the Gaussian blur parameter sigma, image structures can be analyzed uniformly across resolutions.
Scale space is a theoretical framework that describes how image structures evolve when observed at different levels of detail. Introduced by Witkin in 1983 and axiomatically formalized by Lindeberg, it provides the mathematical foundation for multi-scale image analysis. The Gaussian kernel is proven to be the unique linear kernel satisfying the scale-space axioms.
The scale-space representation L(x, y, σ) is defined as the convolution of the original image I(x, y) with a Gaussian kernel G(x, y, σ). As σ increases, fine-scale structures are progressively suppressed, leaving only coarse, global features. At σ = 0 the representation equals the original image, and as σ approaches infinity, the image converges to a uniform value.
- Why Gaussian: The Gaussian is the only kernel satisfying causality (no spurious structures created), isotropy, and linearity - the three fundamental axioms of linear scale space
- Difference of Gaussians (DoG): Subtracting Gaussian-blurred images at adjacent σ values approximates the Laplacian of Gaussian and efficiently detects features at specific scales. This is the core mechanism in SIFT keypoint detection
- Automatic scale selection: The characteristic scale of a feature is determined by finding the σ at which the normalized LoG (Laplacian of Gaussian) response reaches a maximum, enabling scale-invariant feature detection
Scale-space theory underpins virtually all modern feature detection algorithms. SIFT, SURF, and ORB all rely on scale-space extrema detection to locate keypoints that are invariant to changes in viewing distance and zoom level. Understanding scale space is essential for grasping how robust image matching and object recognition systems achieve their invariance properties.