Overfitting
A phenomenon where a model learns training data too well, including noise and idiosyncrasies, resulting in degraded performance on unseen data (poor generalization).
Overfitting occurs when a model memorizes training data noise rather than learning generalizable features, degrading prediction on new data. A classic symptom: training accuracy exceeds 99% while validation accuracy stagnates around 70%. It arises when model complexity is disproportionate to available data.
In image recognition, training a CNN with millions of parameters on a few thousand images makes overfitting almost inevitable. ResNet-50's 25 million parameters require ImageNet-scale data (1.28M images) for proper generalization without aggressive regularization.
- Data augmentation: Expands training data through rotations, flips, color jittering, and random crops. One of the most effective countermeasures, improving generalization without additional data collection
- Dropout: Randomly deactivates neurons during training (typically 50%), preventing feature co-adaptation. At inference, all neurons are active with scaled outputs
- Early stopping: Halts training when validation loss stops improving, saving the model at its best generalization point
- Regularization: L2 weight decay penalizes large weights, constraining complexity and encouraging simpler representations
Transfer learning also combats overfitting: pre-trained feature extractors enable strong generalization from limited task-specific data that would be impossible training from scratch.