CNN (Convolutional Neural Network)
A deep learning architecture that uses convolutional and pooling layers to hierarchically extract spatial features from images, becoming the standard model for image recognition.
A CNN (Convolutional Neural Network) learns spatial patterns in images efficiently. Unlike fully-connected networks that flatten spatial structure, CNNs preserve neighborhood relationships through convolution, enabling hierarchical feature extraction from edges to semantic concepts.
Key components:
- Convolutional layer: Slides filters across input detecting local features. Shallow layers capture edges; deeper layers recognize objects
- Pooling layer: Reduces spatial resolution, decreasing computation while increasing translation robustness
- Batch Normalization: Normalizes input distributions, stabilizing training
- Fully Connected layer: Converts feature vectors into class probabilities
Landmark architectures:
LeNet-5(1998): Pioneered CNN for digit recognitionAlexNet(2012): Won ImageNet, igniting the deep learning revolutionVGGNet(2014): Demonstrated depth importance with 3×3 filtersResNet(2015): Skip connections enabled 100+ layer trainingEfficientNet(2019): Unified scaling of width, depth, and resolution
Beyond classification, CNNs serve as backbones for object detection (YOLO, SSD), segmentation (U-Net, DeepLab), and pose estimation. Vision Transformers compete for large-scale tasks, but CNNs remain dominant on edge devices.