Image Thresholding Types and Optimal Threshold Determination - From Otsu to Adaptive Methods
Thresholding Fundamentals - The Purpose of Separating Images into Black and White
Thresholding (binarization) compares each pixel in a grayscale image against a threshold value, converting it to either white (255) or black (0). It is frequently used in early stages of image processing pipelines as essential preprocessing for contour detection, OCR, object counting, and more.
Why binarize: Many image analysis tasks require clear separation between objects (foreground) and background. While grayscale images contain 256 levels of information, binary classification suffices for decisions like "text or background," "cell or medium," or "defect or normal." Binarization reduces information volume, making subsequent processing faster and more robust.
Mathematical definition:
dst(x,y) = maxval if src(x,y) > thresh
dst(x,y) = 0 otherwise
Types of thresholding:
- Global threshold: Applies a single threshold to the entire image. Effective when illumination is uniform.
- Adaptive threshold: Computes local thresholds per pixel. Essential when illumination is non-uniform.
- Multi-level threshold: Classifies into 3 or more levels using multiple thresholds. Used for multi-stage segmentation.
OpenCV provides cv2.threshold() for global thresholding and cv2.adaptiveThreshold() for adaptive thresholding. Threshold selection is the most critical factor determining binarization quality, and this article explains determination methods in detail.
Fixed Threshold Method - Manual Setting and Histogram Analysis
Fixed thresholding is the simplest binarization technique, applying a single user-specified threshold to the entire image. It is effective when lighting conditions are stable and foreground-background contrast is clear.
Manual threshold determination: Observe the histogram and set the threshold at the valley between two peaks (bimodality) corresponding to foreground and background. For example, in a document image with black text on white paper, the region around 128 between background (200-255) and text (0-80) is an appropriate threshold.
OpenCV implementation:
ret, binary = cv2.threshold(gray, 128, 255, cv2.THRESH_BINARY)
Threshold type variations:
THRESH_BINARY: Above threshold becomes white, below becomes blackTHRESH_BINARY_INV: Above threshold becomes black, below becomes white (inverted)THRESH_TRUNC: Above threshold truncated to threshold value, below unchangedTHRESH_TOZERO: Below threshold set to 0, above unchangedTHRESH_TOZERO_INV: Above threshold set to 0, below unchanged
Limitations of fixed thresholds: When illumination is non-uniform (shadows, gradient lighting, vignetting), a single threshold cannot correctly separate foreground and background. What works for one part of the image fails in other regions where foreground disappears or background remains as noise. Adaptive thresholding solves this problem.
Preprocessing improvements: Applying Gaussian blur (σ=1-3) for noise removal and histogram equalization for contrast enhancement before fixed thresholding makes threshold selection easier and results more stable.
Otsu's Method - The Standard for Automatic Threshold Determination
Otsu's method (1979) automatically determines the optimal threshold based on histogram statistical properties. It selects the threshold that maximizes between-class variance, maximizing the separability of foreground and background. It is the most widely used automatic threshold determination method in image processing.
Algorithm principle: When dividing the image into 2 classes at threshold t (C0: pixel value ≤ t, C1: pixel value > t), find t that maximizes between-class variance σ²_B(t) = ω0(t) × ω1(t) × (μ0(t) - μ1(t))². Here ω0, ω1 are pixel count ratios for each class, and μ0, μ1 are mean intensities for each class.
Computational efficiency: Compute between-class variance for all 256 possible thresholds and select the one yielding maximum value. Using cumulative histogram sums, computation is O(L) (L: number of levels=256), independent of image size.
OpenCV implementation:
ret, binary = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
The return value ret contains the automatically determined threshold.
Prerequisites and limitations of Otsu's method:
- Optimal when histogram exhibits bimodality (two clear peaks)
- Accuracy degrades when foreground-background area ratio is extremely skewed (beyond 1:9)
- Cannot produce meaningful thresholds for unimodal histograms
- Cannot handle non-uniform illumination (inherent limitation of global thresholds)
Multi-level Otsu: Multi-level Otsu for classification into 3 or more levels exists. While not directly implemented in OpenCV 4.x's cv2.threshold(), it is available via scikit-image's threshold_multiotsu().
Adaptive Thresholding - Handling Non-uniform Illumination
Adaptive thresholding computes local thresholds for each pixel in the image. It achieves high-quality binarization impossible with global thresholds for images captured under non-uniform illumination (shadows, spotlights, natural light variations).
Basic principle: The threshold T(x, y) for each pixel (x, y) is computed from statistics of a local region (block size B×B) centered on that pixel. By following local brightness variations, it remains unaffected by illumination non-uniformity.
Mean-based:
T(x,y) = mean(local region) - C
The threshold is the local region's mean intensity minus constant C. C typically ranges from 5-15; larger values expand the foreground classification range.
Gaussian-based:
T(x,y) = gaussian_weighted_mean(local region) - C
Uses Gaussian-weighted mean giving higher weight to pixels closer to center. More accurate near edges than mean-based, and is the standard for document image binarization.
OpenCV implementation:
binary = cv2.adaptiveThreshold(gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, blockSize=11, C=2)
Parameter tuning guidelines:
- Block size: Must be odd. Guideline is 2-3x character height. Too small becomes noise-sensitive; too large weakens illumination correction.
- Constant C: Positive values thicken foreground, negative values thin it. 2-10 is typical for documents.
Sauvola's method: An improved adaptive method that also considers local standard deviation. T(x,y) = μ × (1 + k × (σ/R - 1)). Improves performance in low-contrast regions, excelling at processing aged or degraded documents.
Practical Threshold Design - Document Images and Industrial Inspection
Practical binarization involves designing the entire pipeline including pre-processing and post-processing, not just threshold application. Practical approaches for representative application domains are presented.
Document image binarization pipeline:
- Input: Scanned or camera-captured image
- Preprocessing 1: Skew correction (Hough transform or projection profile)
- Preprocessing 2: Gaussian blur (σ=1.0) for scan noise removal
- Binarization: Adaptive thresholding (Gaussian, blockSize=2x character height, C=5)
- Postprocessing: Morphological opening (3x3) for isolated noise removal
This procedure improves Tesseract OCR recognition accuracy by 15-25% compared to unprocessed input.
Industrial inspection binarization: For semiconductor wafer and PCB defect detection where lighting is controllable, fixed thresholds are effective. However, to handle lot-to-lot variation, using Otsu's method for automatic threshold adjustment with tolerance range (±10) for anomaly detection provides robust design.
Color image binarization: For direct binarization of RGB images, converting to HSV color space and extracting specific hue ranges is effective. For red object detection, generate masks for H: 0-10 or 170-180, S: 100-255, V: 50-255. Implement with cv2.inRange(hsv, lower, upper).
Dynamic threshold design: For time-series images (video, continuous capture) where lighting varies between frames, apply Otsu's method per frame or use moving average of thresholds from previous N frames to suppress sudden variations. Alerting when threshold variation exceeds ±20 as lighting anomaly is also effective design.
Introductory books on image processing are available on Amazon
Advanced Thresholding Methods and Deep Learning Approaches
Advanced methods beyond conventional threshold-based binarization and recent deep learning approaches are introduced. High-accuracy binarization previously impossible with conventional methods is now achievable for complex backgrounds and degraded images.
Niblack's method: T(x,y) = μ + k × σ (k=-0.2 is standard). Uses local mean and standard deviation to set contrast-adaptive thresholds. Has the drawback of excessive noise in background regions, improved by Sauvola's method.
Wolf's method: An improvement over Sauvola that considers the global minimum intensity. Improves performance in extremely dark or low-contrast regions, highly regarded for historical document digitization.
Bradley's method: A fast adaptive binarization using integral images to compute local means in O(1). Computes in constant time regardless of block size, suitable for real-time processing applications.
Deep learning binarization: Research applying semantic segmentation models like U-Net and DeepLabV3 to document binarization is advancing. In DIBCO (Document Image Binarization Competition), deep learning methods significantly outperform conventional methods with F-measure exceeding 95%.
- Advantages: Robust against complex backgrounds, watermarks, bleed-through, and stains
- Disadvantages: Requires training data, high inference cost (GPU recommended)
- Practical examples: Google's document scanning app, Adobe Scan
Hybrid approach: In practice, combining deep learning with conventional methods is effective. A two-stage process performing rough segmentation with deep learning followed by precise boundary refinement with adaptive thresholding achieves both accuracy and speed. Processing time is approximately 50ms per page with GPU.