Généralisation en rétropropagation
Définition
L'apprentissage par rétropropagation semble fonctionner en obtenant tout d'abord un ensemble approximatif de poids qui correspondent aux modèles d'entraînement d'une manière générale, puis en progressant progressivement vers un ensemble de poids qui correspondent exactement aux modèles d'entraînement.
Si l'apprentissage va trop loin dans cette voie, on peut atteindre un ensemble de pondérations qui correspond très bien aux particularités, mais qui n'interpolent pas (c'est-à-dire généralisent) bien.
Français
xxxxxxx
Anglais
generalization in backprop
Learning in backprop seems to operate by first of all getting a rough set of weights which fit the training patterns in a general sort of way, and then working progressively towards a set of weights that fit the training patterns exactly. If learning goes too far down this path, one may reach a set of weights that fits the idiosyncrasies of the particular set of patterns very well, but does not interpolate (i.e. generalize) well.
Moreover, with large complex sets of training patterns, it is likely that some errors may occur, either in the inputs or in the outputs. In that case, and again particularly in the later parts of the learning process, it is likely that backprop will be contorting the weights so as to fit precisely around training patterns that are actually erroneous! This phenomenon is known as over-fitting.
This problem can to some extent be avoided by stopping learning early. How does one tell when to stop? One method is to partition the training patterns into two sets (assuming that there are enough of them). The larger part of the training patterns, say 80% of them, chosen at random, form the training set, and the remaining 20% are referred to as the test set. Every now and again during training, one measures the performance of the current set of weights on the test set. One normally finds that the error on the training set drops monotonically (that's what a gradient descent algorithm is supposed to do, after all). However, error on the test set (which will be larger, per pattern, than the error on the training set) will fall at first, then start to rise as the algorithm begins to overtrain. Best generalization performance is gained by stopping the algorithm at the point where error on the test set starts to rise.
Contributeurs: Imane Meziani, wiki, Sihem Kouache