Stereo Matching
A technique that finds corresponding pixels between a pair of images captured from two cameras to recover depth information. A foundational method for 3D measurement in autonomous driving and robotics.
Stereo matching is the process of identifying corresponding pixels between two images (a stereo pair) captured from different viewpoints. The horizontal displacement between matched pixels, called disparity, is inversely proportional to depth, enabling 3D scene reconstruction through triangulation. This principle mirrors human binocular vision.
Stereo matching assumes calibrated cameras with known intrinsic and extrinsic parameters. Rectification (aligning epipolar lines horizontally) is applied as a preprocessing step, constraining the correspondence search to the same scanline in both images and reducing the problem from 2D to 1D search.
- Local methods: Compare intensity patterns within a fixed window (e.g., 9x9 block) between left and right images using cost functions like SAD (Sum of Absolute Differences) or NCC (Normalized Cross-Correlation). Fast but unreliable in textureless regions and near depth discontinuities
- Semi-Global Matching (SGM): Aggregates matching costs along 8-16 directions with smoothness penalties, achieving a practical balance between local speed and global accuracy. Widely used in automotive and aerial mapping applications
- Deep learning methods: Networks like GC-Net, PSMNet, and RAFT-Stereo construct 4D cost volumes and apply 3D convolutions or iterative refinement to regress disparity maps. They significantly outperform classical methods on benchmarks like KITTI and Middlebury
OpenCV provides cv2.StereoBM_create() for block matching and cv2.StereoSGBM_create() for semi-global matching. Key parameters include numDisparities (search range, must be divisible by 16) and blockSize, which should be tuned based on the scene's depth range and texture characteristics.