Résultats de la recherche

Correspondances dans les titres des pages

Reinforcement learning from human preferences

94 octet (11 mots) - 16 juin 2023 à 21:23

Direct Preference Optimization
...uning. Our experiments show that DPO can fine-tune LMs to align with human preferences as well as or better than existing methods. Notably, fine-tuning with DPO e

2 kio (256 mots) - 29 janvier 2024 à 13:34
Apprentissage par renforcement et rétroaction humaine
''' reinforcement learning from human preferences '''

3 kio (477 mots) - 5 mai 2024 à 04:01