« SARSA » : différence entre les versions
(Page créée avec « ==en construction== == Définition == XXXXXXXXX == Français == ''' XXXXXXXXX ''' == Anglais == ''' State–action–reward–state–action''' State–action–reward... ») |
Aucun résumé des modifications |
||
Ligne 9 : | Ligne 9 : | ||
== Anglais == | == Anglais == | ||
''' State–action–reward–state–action''' | ''' State–action–reward–state–action''' | ||
'''SARSA''' | |||
State–action–reward–state–action (SARSA) is an algorithm for learning a Markov decision process policy, used in the reinforcement learning area of machine learning. It was proposed by Rummery and Niranjan in a technical note[1] with the name "Modified Connectionist Q-Learning" (MCQ-L). The alternative name SARSA, proposed by Rich Sutton, was only mentioned as a footnote. | State–action–reward–state–action (SARSA) is an algorithm for learning a Markov decision process policy, used in the reinforcement learning area of machine learning. It was proposed by Rummery and Niranjan in a technical note[1] with the name "Modified Connectionist Q-Learning" (MCQ-L). The alternative name SARSA, proposed by Rich Sutton, was only mentioned as a footnote. |
Version du 3 janvier 2022 à 16:07
en construction
Définition
XXXXXXXXX
Français
XXXXXXXXX
Anglais
State–action–reward–state–action
SARSA
State–action–reward–state–action (SARSA) is an algorithm for learning a Markov decision process policy, used in the reinforcement learning area of machine learning. It was proposed by Rummery and Niranjan in a technical note[1] with the name "Modified Connectionist Q-Learning" (MCQ-L). The alternative name SARSA, proposed by Rich Sutton, was only mentioned as a footnote.
Contributeurs: Imane Meziani, wiki