« Apprentissage par différence temporelle » : différence entre les versions


(Page créée avec « == Domaine == Category:Vocabulary == Définition == == Termes privilégiés == == Anglais == === Temporal difference learning === Temporal differenc... »)
 
m (Remplacement de texte — « Termes privilégiés » par « Français »)
Ligne 8 : Ligne 8 :
   
   


== Termes privilégiés ==
== Français ==


   
   

Version du 31 décembre 2018 à 15:55

Domaine

Définition

Français

Anglais

Temporal difference learning

Temporal difference (TD) learning is a prediction-based machine learning method. It has primarily been used for the reinforcement learning problem, and is said to be "a combination of Monte Carlo ideas and dynamic programming (DP) ideas."[1] TD resembles a Monte Carlo method because it learns by sampling the environment according to some policy[clarification needed], and is related to dynamic programming techniques as it approximates its current estimate based on previously learned estimates (a process known as bootstrapping). The TD learning algorithm is related to the temporal difference model of animal learning.[2]