Transformateur Switch


Révision datée du 16 juin 2021 à 15:28 par Pitpitt (discussion | contributions) (Page créée avec « ==en construction== == Définition == XXXXXXXXX == Français == ''' XXXXXXXXX ''' == Anglais == ''' Switch transformer''' A new neural net which goal was facilitating... »)
(diff) ← Version précédente | Voir la version actuelle (diff) | Version suivante → (diff)

en construction

Définition

XXXXXXXXX

Français

XXXXXXXXX

Anglais

Switch transformer A new neural net which goal was facilitating the creation of larger models without increasing computational costs.

The feature that distinguishes this model from previous ones is a simplification of the Mixture of Experts algorithm. Mixture of Experts (MoE) consist of a system by which tokens (elemental parts of the input) entering the model are sent to be processed by different parts of the neural net (experts). Thus, to process a given token, only a subsection of the model is active; we have a sparse model. This reduces the computational costs, allowing them to reach the trillion-parameter mark.


[https://towardsdatascience.com/top-5-gpt-3-successors-you-should-know-in-2021-42ffe94cbbf

  Source : towardsdatascience]


Contributeurs: Imane Meziani, wiki