« Autoattention multitêtes » : différence entre les versions
(Page créée avec « ==en construction== == Définition == XXXXXXXXX == Français == ''' XXXXXXXXX ''' == Anglais == ''' Multi-Head Attention''' Multi-head Attention is a module for attention mechanisms which runs through an attention mechanism several times in parallel. The independent attention outputs are then concatenated and linearly transformed into the expected dimension. Intuitively, multiple attention heads allows for attending to parts of the sequence differently (e.g.... ») |
Aucun résumé des modifications |
||
Ligne 1 : | Ligne 1 : | ||
== Définition == | == Définition == | ||
XXXXXXXXX | XXXXXXXXX | ||
== Français == | == Français == | ||
''' | ''' Autoattention multitêtes ''' | ||
''' Autoattention multi-têtes ''' | |||
== Anglais == | == Anglais == | ||
''' Multi-Head Attention''' | ''' Multi-Head Attention''' | ||
''' Multi-Head Self-Attention''' | |||
Multi-head Attention is a module for attention mechanisms which runs through an attention mechanism several times in parallel. The independent attention outputs are then concatenated and linearly transformed into the expected dimension. Intuitively, multiple attention heads allows for attending to parts of the sequence differently (e.g. longer-term dependencies versus shorter-term dependencies). | Multi-head Attention is a module for attention mechanisms which runs through an attention mechanism several times in parallel. The independent attention outputs are then concatenated and linearly transformed into the expected dimension. Intuitively, multiple attention heads allows for attending to parts of the sequence differently (e.g. longer-term dependencies versus shorter-term dependencies). | ||
Ligne 14 : | Ligne 17 : | ||
== Source == | == Source == | ||
[https://infoscience.epfl.ch/record/300271/files/EPFL_TH9822.pdf Source : Cordonnier, J.-B. (2023), ''Transformer Models for Vision''.] | |||
[https://medium.com/@punya8147_26846/difference-between-self-attention-and-multi-head-self-attention-e33ebf4f3ee0, Source: Punyakeerthi (2024), ''Difference between Self-Attention and Multi-head Self-Attention''] | |||
[https://cs.paperswithcode.com/method/multi-head-attention Source : paperswithcode] | [https://cs.paperswithcode.com/method/multi-head-attention Source : paperswithcode] | ||
[[Catégorie:Publication]] | |||
[[Catégorie: |
Version du 3 septembre 2024 à 14:06
Définition
XXXXXXXXX
Français
Autoattention multitêtes
Autoattention multi-têtes
Anglais
Multi-Head Attention
Multi-Head Self-Attention
Multi-head Attention is a module for attention mechanisms which runs through an attention mechanism several times in parallel. The independent attention outputs are then concatenated and linearly transformed into the expected dimension. Intuitively, multiple attention heads allows for attending to parts of the sequence differently (e.g. longer-term dependencies versus shorter-term dependencies).
Source
Source : Cordonnier, J.-B. (2023), Transformer Models for Vision.
Source: Punyakeerthi (2024), Difference between Self-Attention and Multi-head Self-Attention
Contributeurs: Claude Coulombe, Patrick Drouin, wiki