Tokenisation

en construction

Définition

XXXXXXXXX

Français

Tokenisation

Anglais

Tokenization

In natural language processing, tokenization is the process of chopping down a sentence into individual words or tokens. In the process of forming tokens, punctuation or special characters are often removed entirely. Tokens are constructed from a specific body of text to be used for statistical analysis and processing. It’s worth mentioning that a token doesn’t necessarily need to be one word; for example, “rock ’n’ roll,” “3-D printer” are tokens, and they are constructed from multiple words. To put it simply, tokenization is a technique used to simplify a corpus to prepare it for the next stages of processing.

[XXXXXXX Source : Wikipedia ]

Source : GDPELLETIER