Tokenisation


Révision datée du 13 février 2021 à 11:37 par Pitpitt (discussion | contributions) (Page créée avec « ==en construction== == Définition == XXXXXXXXX == Français == ''' Tokenisation''' == Anglais == ''' Tokenization''' In natural language processing, tokenization is t... »)
(diff) ← Version précédente | Voir la version actuelle (diff) | Version suivante → (diff)

en construction

Définition

XXXXXXXXX

Français

Tokenisation

Anglais

Tokenization

In natural language processing, tokenization is the process of chopping down a sentence into individual words or tokens. In the process of forming tokens, punctuation or special characters are often removed entirely. Tokens are constructed from a specific body of text to be used for statistical analysis and processing. It’s worth mentioning that a token doesn’t necessarily need to be one word; for example, “rock ’n’ roll,” “3-D printer” are tokens, and they are constructed from multiple words. To put it simply, tokenization is a technique used to simplify a corpus to prepare it for the next stages of processing.

[XXXXXXX Source : Wikipedia ]

Source : GDPELLETIER