« Quantification » : différence entre les versions
(Page créée avec « ==en construction== == Définition == XXXXXXXXX == Français == ''' XXXXXXXXX ''' == Anglais == ''' Quantisation''' allows us to reduce the size of our neural networ... ») |
Aucun résumé des modifications |
||
Ligne 14 : | Ligne 14 : | ||
[https://towardsdatascience.com/quantisation-and-co-reducing-inference-times-on-llms-by-80-671db9349bdb Source : towardsdatascience] | [https://towardsdatascience.com/quantisation-and-co-reducing-inference-times-on-llms-by-80-671db9349bdb Source : towardsdatascience] | ||
[https://www.mathworks.com/discovery/quantization.html Source : mathworks] | |||
[[Catégorie:vocabulary]] | [[Catégorie:vocabulary]] |
Version du 30 octobre 2023 à 09:12
en construction
Définition
XXXXXXXXX
Français
XXXXXXXXX
Anglais
Quantisation
allows us to reduce the size of our neural networks by converting the network’s weights and biases from their original floating-point format (e.g. 32-bit) to a lower precision format (e.g. 8-bit). The original floating point format can vary depending on several factors such as the model’s architecture and training processes. The ultimate purpose of quantisation is to reduce the size of our model, thereby reducing memory and computational requirements to run inference and train our model. Quantisation can very quickly become fiddly if you are attempting to quantise the models yourself.
Contributeurs: Claude Coulombe, Marie Alfaro, wiki