JA EN

Image Thresholding Types and Optimal Threshold Determination - From Otsu to Adaptive Methods

· 9 min read

Thresholding Fundamentals - The Purpose of Separating Images into Black and White

Thresholding (binarization) compares each pixel in a grayscale image against a threshold value, converting it to either white (255) or black (0). It is frequently used in early stages of image processing pipelines as essential preprocessing for contour detection, OCR, object counting, and more.

Why binarize: Many image analysis tasks require clear separation between objects (foreground) and background. While grayscale images contain 256 levels of information, binary classification suffices for decisions like "text or background," "cell or medium," or "defect or normal." Binarization reduces information volume, making subsequent processing faster and more robust.

Mathematical definition:

dst(x,y) = maxval if src(x,y) > thresh

dst(x,y) = 0 otherwise

Types of thresholding:

OpenCV provides cv2.threshold() for global thresholding and cv2.adaptiveThreshold() for adaptive thresholding. Threshold selection is the most critical factor determining binarization quality, and this article explains determination methods in detail.

Fixed Threshold Method - Manual Setting and Histogram Analysis

Fixed thresholding is the simplest binarization technique, applying a single user-specified threshold to the entire image. It is effective when lighting conditions are stable and foreground-background contrast is clear.

Manual threshold determination: Observe the histogram and set the threshold at the valley between two peaks (bimodality) corresponding to foreground and background. For example, in a document image with black text on white paper, the region around 128 between background (200-255) and text (0-80) is an appropriate threshold.

OpenCV implementation:

ret, binary = cv2.threshold(gray, 128, 255, cv2.THRESH_BINARY)

Threshold type variations:

Limitations of fixed thresholds: When illumination is non-uniform (shadows, gradient lighting, vignetting), a single threshold cannot correctly separate foreground and background. What works for one part of the image fails in other regions where foreground disappears or background remains as noise. Adaptive thresholding solves this problem.

Preprocessing improvements: Applying Gaussian blur (σ=1-3) for noise removal and histogram equalization for contrast enhancement before fixed thresholding makes threshold selection easier and results more stable.

Otsu's Method - The Standard for Automatic Threshold Determination

Otsu's method (1979) automatically determines the optimal threshold based on histogram statistical properties. It selects the threshold that maximizes between-class variance, maximizing the separability of foreground and background. It is the most widely used automatic threshold determination method in image processing.

Algorithm principle: When dividing the image into 2 classes at threshold t (C0: pixel value ≤ t, C1: pixel value > t), find t that maximizes between-class variance σ²_B(t) = ω0(t) × ω1(t) × (μ0(t) - μ1(t))². Here ω0, ω1 are pixel count ratios for each class, and μ0, μ1 are mean intensities for each class.

Computational efficiency: Compute between-class variance for all 256 possible thresholds and select the one yielding maximum value. Using cumulative histogram sums, computation is O(L) (L: number of levels=256), independent of image size.

OpenCV implementation:

ret, binary = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)

The return value ret contains the automatically determined threshold.

Prerequisites and limitations of Otsu's method:

Multi-level Otsu: Multi-level Otsu for classification into 3 or more levels exists. While not directly implemented in OpenCV 4.x's cv2.threshold(), it is available via scikit-image's threshold_multiotsu().

Adaptive Thresholding - Handling Non-uniform Illumination

Adaptive thresholding computes local thresholds for each pixel in the image. It achieves high-quality binarization impossible with global thresholds for images captured under non-uniform illumination (shadows, spotlights, natural light variations).

Basic principle: The threshold T(x, y) for each pixel (x, y) is computed from statistics of a local region (block size B×B) centered on that pixel. By following local brightness variations, it remains unaffected by illumination non-uniformity.

Mean-based:

T(x,y) = mean(local region) - C

The threshold is the local region's mean intensity minus constant C. C typically ranges from 5-15; larger values expand the foreground classification range.

Gaussian-based:

T(x,y) = gaussian_weighted_mean(local region) - C

Uses Gaussian-weighted mean giving higher weight to pixels closer to center. More accurate near edges than mean-based, and is the standard for document image binarization.

OpenCV implementation:

binary = cv2.adaptiveThreshold(gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, blockSize=11, C=2)

Parameter tuning guidelines:

Sauvola's method: An improved adaptive method that also considers local standard deviation. T(x,y) = μ × (1 + k × (σ/R - 1)). Improves performance in low-contrast regions, excelling at processing aged or degraded documents.

Practical Threshold Design - Document Images and Industrial Inspection

Practical binarization involves designing the entire pipeline including pre-processing and post-processing, not just threshold application. Practical approaches for representative application domains are presented.

Document image binarization pipeline:

This procedure improves Tesseract OCR recognition accuracy by 15-25% compared to unprocessed input.

Industrial inspection binarization: For semiconductor wafer and PCB defect detection where lighting is controllable, fixed thresholds are effective. However, to handle lot-to-lot variation, using Otsu's method for automatic threshold adjustment with tolerance range (±10) for anomaly detection provides robust design.

Color image binarization: For direct binarization of RGB images, converting to HSV color space and extracting specific hue ranges is effective. For red object detection, generate masks for H: 0-10 or 170-180, S: 100-255, V: 50-255. Implement with cv2.inRange(hsv, lower, upper).

Dynamic threshold design: For time-series images (video, continuous capture) where lighting varies between frames, apply Otsu's method per frame or use moving average of thresholds from previous N frames to suppress sudden variations. Alerting when threshold variation exceeds ±20 as lighting anomaly is also effective design.

Advanced Thresholding Methods and Deep Learning Approaches

Advanced methods beyond conventional threshold-based binarization and recent deep learning approaches are introduced. High-accuracy binarization previously impossible with conventional methods is now achievable for complex backgrounds and degraded images.

Niblack's method: T(x,y) = μ + k × σ (k=-0.2 is standard). Uses local mean and standard deviation to set contrast-adaptive thresholds. Has the drawback of excessive noise in background regions, improved by Sauvola's method.

Wolf's method: An improvement over Sauvola that considers the global minimum intensity. Improves performance in extremely dark or low-contrast regions, highly regarded for historical document digitization.

Bradley's method: A fast adaptive binarization using integral images to compute local means in O(1). Computes in constant time regardless of block size, suitable for real-time processing applications.

Deep learning binarization: Research applying semantic segmentation models like U-Net and DeepLabV3 to document binarization is advancing. In DIBCO (Document Image Binarization Competition), deep learning methods significantly outperform conventional methods with F-measure exceeding 95%.

Hybrid approach: In practice, combining deep learning with conventional methods is effective. A two-stage process performing rough segmentation with deep learning followed by precise boundary refinement with adaptive thresholding achieves both accuracy and speed. Processing time is approximately 50ms per page with GPU.

Related Articles

Morphological Operations Fundamentals - Dilation, Erosion, Opening, and Closing Explained

Systematically explains morphological operations as fundamental image processing tools. Covers dilation, erosion, opening, closing principles with structuring element design and practical applications.

Image Segmentation Fundamentals - Understanding Region Division Principles and Applications

From basic concepts to deep learning-based methods in image segmentation. Learn the differences between semantic, instance, and panoptic segmentation with practical web application examples.

Dithering Techniques - Types and Applications for Representing Gradients with Limited Colors

Compare error diffusion, Bayer dithering, and blue noise techniques. Covers principles, characteristics, and applications from retro aesthetics to printing.

Image Fingerprinting Technology - Detecting Similar Images with pHash and dHash

Learn how image fingerprinting algorithms (pHash, dHash, aHash) work. Compare perceptual hash techniques for duplicate detection, copyright enforcement, and reverse image search applications.

Image Processing for Industrial Inspection - From Visual Inspection to Dimensional Measurement

Systematic guide to image processing in manufacturing quality control covering defect detection, dimensional measurement, pattern matching, and deep learning anomaly detection.

Histogram Equalization for Contrast Enhancement - Optimizing Image Brightness Distribution

From the mathematical principles of histogram equalization to local contrast improvement with CLAHE. Learn techniques that dramatically improve low-contrast images with proper parameter settings.

Related Terms