« Encodage par paires d'octets » : différence entre les versions


(Page créée avec « == en construction == == Définition == XXXXXXX Voir aussi '''traitement automatique de la langue naturelle''' == Français == ''' XXXXXXX''' == Anglais == ''' Byte Pair Encoding''' ''' BPE''' ''BPE is a simple form of data compression algorithm in which the most common pair of consecutive bytes of data is replaced with a byte that does not occur in that data'' == Source == [https://www.geeksforgeeks.org/byte-pair-encoding-bpe-in-nlp/ Source : Geeks... »)
 
Aucun résumé des modifications
Ligne 4 : Ligne 4 :
XXXXXXX
XXXXXXX


Voir aussi '''[[traitement automatique de la langue naturelle]]'''
Voir aussi '''[[segment]]''', '''[[traitement automatique de la langue naturelle]]''' et '''[[Vocabulary (NLP)]]'''


== Français ==
== Français ==
Ligne 14 : Ligne 14 :
''' BPE'''
''' BPE'''


''BPE is a simple form of data compression algorithm in which the most common pair of consecutive bytes of data is replaced with a byte that does not occur in that data''
''Byte Pair Encoding is a simple form of data compression algorithms and is one of the most widely used subword-tokenization algorithms. It replaces the most frequent pair of bytes of data with a new byte that was not contained int the initial dataset. In Natural Language Processing, BPE is used to represent large vocabulary with a small set of subword units and most common words are represented in the vocabulary as a single token.''
 
''It is used in all of GPT versions, RoBERTa, XML, FlauBERT and more.''


== Source ==
== Source ==

Version du 15 novembre 2024 à 11:40

en construction

Définition

XXXXXXX

Voir aussi segment, traitement automatique de la langue naturelle et Vocabulary (NLP)

Français

XXXXXXX

Anglais

Byte Pair Encoding

BPE

Byte Pair Encoding is a simple form of data compression algorithms and is one of the most widely used subword-tokenization algorithms. It replaces the most frequent pair of bytes of data with a new byte that was not contained int the initial dataset. In Natural Language Processing, BPE is used to represent large vocabulary with a small set of subword units and most common words are represented in the vocabulary as a single token.

It is used in all of GPT versions, RoBERTa, XML, FlauBERT and more.

Source

Source : Geeks for Geeks

Source : Medium

Source : Wikipedia

Contributeurs: Arianne , wiki