Encoder-Decoder
A neural network architecture consisting of an encoder that compresses input into a compact latent representation and a decoder that reconstructs the desired output from that representation.
The Encoder-Decoder architecture is a two-stage network where an encoder compresses input into a low-dimensional feature representation and a decoder reconstructs the target output. In computer vision, this pattern underpins semantic segmentation, super-resolution, and image-to-image translation.
The encoder applies repeated convolution and pooling, reducing spatial resolution while increasing channel depth. A 256x256x3 input might compress to 8x8x512. The decoder reverses this using transposed convolutions or bilinear upsampling, restoring spatial dimensions for the final output.
- U-Net: Adds skip connections from each encoder stage to the corresponding decoder stage, preserving fine spatial details lost during compression. Widely used in medical image segmentation
- SegNet: Reuses max-pooling indices from the encoder during decoder upsampling, reducing parameters while maintaining accurate boundary delineation
- Bottleneck: The junction between encoder and decoder has the lowest spatial resolution, encoding global context about the entire input
Pre-trained encoders (ResNet, VGG on ImageNet) provide strong feature extractors via transfer learning, enabling high accuracy with limited labeled data. Decoder design significantly impacts output quality.