« Apprentissage par renforcement avec borne de confiance supérieure » : différence entre les versions


(Page créée avec « Recall the general setup for reinforcement learning: we have well-defined actions that we can take, so we let the machine figure out how to maximize its reward based on th... »)
 
Aucun résumé des modifications
Ligne 1 : Ligne 1 :
== Domaine ==
[[Category:Vocabulary]]Vocabulary<br />
<br />
== Définition ==
texte ici
<br />
== Français ==
'''terme_français'''
<br />
== Anglais ==
'''Reinforcement Learning with the Upper Confidence Bound'''
Recall the general setup for reinforcement learning: we have well-defined actions that we can take, so we let the machine figure out how to maximize its reward based on the consequences of those actions.
Recall the general setup for reinforcement learning: we have well-defined actions that we can take, so we let the machine figure out how to maximize its reward based on the consequences of those actions.
   
   

Version du 9 mars 2019 à 11:29

Domaine

Vocabulary


Définition

texte ici


Français

terme_français


Anglais

Reinforcement Learning with the Upper Confidence Bound

Recall the general setup for reinforcement learning: we have well-defined actions that we can take, so we let the machine figure out how to maximize its reward based on the consequences of those actions.

The Upper Confidence Bound algorithm is a formalization of this idea, where the machine attempts to determine a single action it can take that will maximize its expected return.

https://opendatascience.com/machine-learning-for-beginners/