EN JA ZH ES

Deep Dive into Image Compression Algorithms - DCT, Wavelet Transform, and Predictive Coding

· 9 min read

Fundamental Principles of Image Compression - Redundancy Removal and Human Vision

Image compression reduces file sizes by removing redundancy in image data. Uncompressed images contain enormous data - a 4000x3000px 24-bit color image is 36MB raw, reducible to 2-5MB with JPEG compression.

Types of redundancy in image data:

  • Spatial redundancy: Adjacent pixels often have similar colors (blue sky, white walls). Exploiting this correlation by recording only differences reduces data
  • Statistical redundancy: Value frequencies are uneven. Entropy coding assigns short bit sequences to frequent values and long sequences to rare values
  • Visual redundancy (psychovisual): Human eyes are insensitive to high-frequency components (fine textures) and subtle chrominance changes. Removing imperceptible information barely affects perceived quality - the principle of lossy compression

Algorithm classification: Lossless (PNG Deflate, WebP Lossless) achieves 2:1-3:1 ratios with perfect reconstruction. Lossy (JPEG DCT, WebP VP8, AVIF AV1) achieves 10:1-50:1+ by removing imperceptible information. Modern compression follows a Transform, Quantization, Entropy Coding three-stage pipeline, each removing different redundancy types.

DCT (Discrete Cosine Transform) - Core Technology Behind JPEG

DCT is the mathematical transform at JPEG compression's core. It converts images from spatial domain (pixel values) to frequency domain (frequency component intensities), enabling efficient removal of high-frequency components invisible to human eyes.

JPEG DCT processing flow:

  • Block division: Split image into 8x8 pixel blocks processed independently. The 8x8 size balances computational and compression efficiency
  • DCT transform: Apply 2D DCT to each block, converting to 64 frequency coefficients. Top-left coefficient (DC component) represents average block brightness; bottom-right represents high frequencies
  • Quantization: Divide each coefficient by quantization table values, truncating decimals. Higher frequencies use larger divisors, zeroing many high-frequency coefficients. This lossy stage is controlled by quality settings
  • Zigzag scan: Reorder 64 coefficients from low to high frequency in zigzag pattern, concentrating zeros at array end
  • Entropy coding: Combine run-length encoding (counting consecutive zeros) with Huffman coding for final bitstream

The DCT is reversible - inverse DCT (IDCT) perfectly reconstructs original values before quantization. Quantization is the only lossy step in the entire JPEG pipeline.

Wavelet Transform - Foundation of JPEG 2000 and Next-Gen Compression

Wavelet Transform was developed to overcome DCT limitations. Used in JPEG 2000, some HEIF modes, and medical imaging (DICOM). While DCT processes fixed 8x8 blocks, wavelet transform decomposes entire images at multiple resolutions.

How wavelet transform works:

  • Multi-resolution decomposition: Separates image into approximation (low-frequency) and detail (high-frequency) components. Filtering horizontally and vertically produces four subbands: LL, LH, HL, HH
  • Recursive decomposition: Recursively applies same decomposition to LL subband. Typically 3-5 levels, hierarchically representing coarse structure to fine detail
  • Quantization and coding: Quantize subband coefficients and compress with advanced coding like EBCOT

Advantages over DCT: No block artifacts (processes entire image), progressive display (transmit low-to-high resolution incrementally), ROI coding (selectively preserve quality in specific regions). JPEG 2000 uses CDF 9/7 wavelet (lossy) and CDF 5/3 wavelet (lossless) with image-optimized filter coefficients.

Predictive Coding and Intra Prediction - AV1/HEVC High-Efficiency Compression

Predictive coding predicts current pixel values from already-processed adjacent pixels, encoding only the prediction residual (difference). More accurate predictions yield smaller residuals and higher compression. This is the core of AVIF (AV1) and HEIF (HEVC) compression.

Intra Prediction mechanism:

  • Directional prediction modes: AV1 has 56 directional modes (HEVC has 35). Extrapolates adjacent block pixels at various angles (horizontal, vertical, diagonal 45/22.5 degrees) into current block. Selecting modes matching edge directions minimizes residuals
  • DC prediction: Simplest mode predicting entire block with adjacent pixel average. Effective for uniform regions
  • CfL (Chroma from Luma): AV1-specific technology predicting chrominance from luminance channel. High luma-chroma correlation dramatically reduces chroma residuals

Why AV1 achieves high efficiency: Variable block sizes (4x4 to 128x128, recursively split), 56 intra prediction directions, adaptive transform type selection (DCT, ADST, Identity), loop filters (deblocking, CDEF, Loop Restoration), and ANS entropy coding.

Entropy Coding - Huffman and Arithmetic Coding Principles

Entropy coding is the final stage removing statistical redundancy. It assigns variable-length bit sequences based on symbol occurrence probability, approaching theoretical minimum size (Shannon entropy).

Major entropy coding methods:

  • Huffman coding: Used in JPEG. Assigns variable-length prefix codes based on probability. Frequent symbols get short codes (2 bits), rare symbols get long codes (12 bits). Simple and fast but requires minimum 1 bit per symbol, not reaching theoretical limit
  • Arithmetic coding: Used in JPEG 2000, H.264 CABAC. Represents entire symbol sequence as single number in 0-1 interval. 5-10% better compression than Huffman but higher computational cost
  • ANS (Asymmetric Numeral Systems): Used in AV1, Zstandard. Achieves arithmetic coding compression at near-Huffman speed. Characterized by asymmetric encode/decode (LIFO structure)
  • Context modeling: Dynamically updates probability models based on surrounding symbols. CABAC uses adjacent block information to predict current symbol probability, improving coding efficiency

Shannon entropy formula: H = -Σ p(x) * log2(p(x)) represents theoretical minimum bits per symbol. Equal probability (50/50) yields 1 bit/symbol; 99/1 probability yields ~0.08 bits/symbol enabling dramatic compression.

Compression Quality Metrics - PSNR, SSIM, VMAF Differences and Usage

Multiple metrics exist for objectively evaluating compression quality, each measuring different aspects. Selecting appropriate metrics enables compression parameter optimization and fair format comparisons.

Major quality metrics:

  • PSNR: Classic metric calculated from pixel value differences (MSE). PSNR = 10 * log10(MAX^2 / MSE) in dB, higher is better. Generally 30dB+ is acceptable, 40dB+ is high quality. Simple and fast but doesn't always match human perception (can't detect structural distortion)
  • SSIM: Mimics human visual system, evaluating luminance, contrast, and structure similarity. Values 0-1, with 1 being identical. 0.95+ is perceptually equivalent. More perceptually accurate than PSNR, most widely used for compression quality comparison
  • VMAF: Netflix-developed ML-based metric combining multiple features (VIF, DLM, Motion). Highest correlation with human subjective evaluation (MOS). Values 0-100, 93+ is excellent. Strongest for video but applicable to still images
  • Butteraugli: Google-developed perceptual metric modeling spatial frequency sensitivity and masking effects, detecting minimum human-noticeable differences. Used in JPEG XL quality control

Practical selection: SSIM for fast bulk comparison, VMAF for most accurate perceptual quality, Butteraugli for automatic parameter tuning. Combining multiple metrics for comprehensive judgment is recommended.

Related Articles

Image Compression Explained - How JPEG, PNG, and WebP Work

A technical deep dive into JPEG, PNG, and WebP compression algorithms. Learn the differences between lossy and lossless compression, when to use each format, and how to optimize images for the web.

Image Format Comparison - JPEG/PNG/WebP/AVIF/GIF/BMP Features and Use Cases

Compare technical characteristics of 6 major image formats. Organized comparison of compression methods, color depth, transparency, animation, and browser support with optimal format selection by use case.

Wavelet Transform and JPEG 2000 - Multi-Resolution Analysis for Image Compression

From wavelet transform principles to JPEG 2000 compression algorithms. Covers DWT, subband coding, and EBCOT mechanisms with practical examples.

Image Processing in the Frequency Domain - Practical FFT and DCT Analysis and Filtering

Explains frequency domain image processing techniques. Covers FFT spectrum analysis, DCT compression principles, and frequency filter design with practical implementation examples.

Lossless vs Lossy Compression - When to Use Each for Web Images

Understand the difference between lossless and lossy image compression. Compare PNG, WebP lossless, JPEG, and AVIF with file size benchmarks to choose the right format for photos, icons, and screenshots.

Fourier Filtering for Noise Removal - Image Processing in the Frequency Domain

Explains noise removal using Fourier transforms. Covers DFT principles, low-pass/high-pass filter design, notch filters for periodic noise, and Python implementation.

Related Terms