JA EN

Attention Mechanism

A neural network component that dynamically computes relevance scores across input elements, enabling the model to focus on the most informative parts of the data.

An attention mechanism dynamically computes importance weights for each element of an input, allowing the model to focus on the most relevant information. Originally proposed for machine translation in 2014, it became the cornerstone of the Transformer architecture in 2017.

In computer vision, self-attention models long-range dependencies between distant spatial locations, overcoming the limited receptive field of convolutions. Vision Transformer (ViT) showed that pure self-attention over image patches can match or exceed CNN performance.

Attention is integral to object detection (DETR), segmentation, and image generation. The quadratic cost has spurred efficient variants including linear attention, flash attention, and sparse patterns.

Related Terms

Related Articles