Encodage par paires d'octets


en construction

Définition

XXXXXXX

Voir aussi segment, traitement automatique de la langue naturelle et Vocabulary (NLP)

Français

XXXXXXX

Anglais

Byte Pair Encoding

BPE

Byte Pair Encoding is a simple form of data compression algorithms and is one of the most widely used subword-tokenization algorithms. It replaces the most frequent pair of bytes of data with a new byte that was not contained int the initial dataset. In Natural Language Processing, BPE is used to represent large vocabulary with a small set of subword units and most common words are represented in the vocabulary as a single token.

It is used in all of GPT versions, RoBERTa, XML, FlauBERT and more.

Source

Source : Geeks for Geeks

Source : Medium

Source : Wikipedia

Contributeurs: Arianne