« Tokenisation » : différence entre les versions

Dernière version du 23 décembre 2023 à 10:15

Rediriger vers :

@@ Ligne 1 : / Ligne 1 : @@
-==en construction==
+#REDIRECTION[[Segmentation]]
-== Définition ==
+[[Catégorie:GRAND LEXIQUE FRANÇAIS]]
-XXXXXXXXX
-== Français ==
-''' Tokenisation'''
-== Anglais ==
-''' Tokenization'''
-In natural language processing, tokenization is the process of chopping down a sentence into individual words or tokens. In the process of forming tokens, punctuation or special characters are often removed entirely.
-Tokens are constructed from a specific body of text to be used for statistical analysis and processing. It’s worth mentioning that a token doesn’t necessarily need to be one word; for example, “rock ’n’ roll,” “3-D printer” are tokens, and they are constructed from multiple words.
-To put it simply, tokenization is a technique used to simplify a corpus to prepare it for the next stages of processing.
-<small>
-[XXXXXXX   Source : Wikipedia ]
-Source : GDPELLETIER
-[[Catégorie:vocabulary]]
-[[Catégorie:vocabulaire]]