Tokenisation
en construction
Définition
XXXXXXXXX
Français
Tokenisation
Anglais
Tokenization
In natural language processing, tokenization is the process of chopping down a sentence into individual words or tokens. In the process of forming tokens, punctuation or special characters are often removed entirely. Tokens are constructed from a specific body of text to be used for statistical analysis and processing. It’s worth mentioning that a token doesn’t necessarily need to be one word; for example, “rock ’n’ roll,” “3-D printer” are tokens, and they are constructed from multiple words. To put it simply, tokenization is a technique used to simplify a corpus to prepare it for the next stages of processing.
[XXXXXXX Source : Wikipedia ]
Source : GDPELLETIER
Contributeurs: Claude Coulombe, Imane Meziani, wiki