Dropout
A regularization technique that randomly deactivates neurons during training, preventing co-adaptation and reducing overfitting by implicitly training an ensemble of sub-networks.
Dropout is a regularization technique that randomly sets a fraction of neuron activations to zero during each training step. Proposed by Srivastava et al. in 2014, it prevents overfitting by reducing co-adaptation between neurons. Dropped neurons are excluded from both forward and backward passes for that iteration.
Intuitively, dropout trains a different sub-network on each mini-batch, approximating an ensemble of exponentially many sub-networks. At inference, all neurons are active and outputs are scaled by (1 - drop rate). Modern implementations use Inverted Dropout, scaling during training instead.
- Drop rate selection: Common values are 0.5 for fully connected layers and 0.1-0.3 for convolutional layers. Higher rates suit smaller datasets
- Spatial Dropout: Drops entire feature map channels rather than individual activations, respecting the spatial correlation structure of convolutional features
- DropPath (Stochastic Depth): Randomly skips entire residual blocks during training in ResNets and Vision Transformers, stabilizing deep network optimization
With Batch Normalization's adoption, dropout usage in CNNs has declined since both provide regularization. However, dropout remains standard in fully connected layers and Transformers. Recent extensions include DropKey (applied to attention weights) and R-Drop (enforcing output consistency between dropout masks).