« Transformateur Switch » : différence entre les versions

Version du 16 juin 2021 à 14:28

en construction

Définition

XXXXXXXXX

Français

XXXXXXXXX

Anglais

Switch transformer A new neural net which goal was facilitating the creation of larger models without increasing computational costs.

The feature that distinguishes this model from previous ones is a simplification of the Mixture of Experts algorithm. Mixture of Experts (MoE) consist of a system by which tokens (elemental parts of the input) entering the model are sent to be processed by different parts of the neural net (experts). Thus, to process a given token, only a subsection of the model is active; we have a sparse model. This reduces the computational costs, allowing them to reach the trillion-parameter mark.

[https://towardsdatascience.com/top-5-gpt-3-successors-you-should-know-in-2021-42ffe94cbbf

  Source : towardsdatascience]

« Transformateur Switch » : différence entre les versions