SIFT
Scale-Invariant Feature Transform. An algorithm that extracts local image features invariant to scale changes and rotation, serving as a foundational technique for image matching.
SIFT (Scale-Invariant Feature Transform), published by David Lowe in 2004, is a landmark algorithm for detecting and describing local image features. It produces keypoints that are invariant to scale, rotation, and partially invariant to illumination and viewpoint changes, each described by a 128-dimensional vector.
The SIFT pipeline consists of four stages. First, keypoint candidates are identified as extrema in the Difference of Gaussians (DoG) scale space. Second, candidates are refined to sub-pixel accuracy and filtered to remove low-contrast points and edge responses. Third, a dominant orientation is assigned to each keypoint based on local gradient histograms, providing rotation invariance. Finally, a 16x16 patch around each keypoint (aligned to its dominant orientation) is divided into 4x4 sub-regions, and an 8-bin gradient histogram is computed for each, yielding the 128-dimensional descriptor.
- Scale invariance: By detecting extrema across the DoG scale space, SIFT reliably finds the same physical structures regardless of imaging distance or zoom level
- Rotation invariance: Normalizing the descriptor relative to the keypoint's dominant gradient direction ensures identical descriptors under image rotation
- Matching strategy: Descriptors are matched using Euclidean distance with Lowe's ratio test (threshold 0.7-0.8), which compares the nearest-neighbor distance to the second-nearest to reject ambiguous matches
SIFT's patent expired in 2020, making it freely available in OpenCV 4.4+ via cv2.SIFT_create(). It remains the gold standard for applications including panorama stitching, 3D reconstruction, and object recognition. For real-time applications where SIFT's computational cost is prohibitive, faster alternatives like ORB and AKAZE provide reasonable trade-offs between speed and matching quality.