AI Glossary: What Is Shaping Reward (SR)? Definition & Meaning

Façonnage Récompense is a concept in apprentissage par renforcement (RL) that involves modifying the reward structure provided to an agent in order to promote desired behaviors and improve learning efficiency. Rather than offering a single reward for achieving a specific goal, shaping reward techniques break down complex tasks into manageable components, rewarding incremental progress towards the final objective.

In traditional reinforcement learning, an agent receives a reward signal only when it reaches a goal state. However, this can lead to inefficiencies, especially in environments with long or complex sequences of actions required to achieve that goal. Shaping reward addresses this issue by providing intermediate rewards based on the agent’s actions or states that are closer to the final goal.

Par exemple, lors de l'entraînement d'un robot pour naviguer dans un labyrinthe, au lieu de ne récompenser le robot que lorsqu'il sort du labyrinthe, il pourrait recevoir de petites récompenses pour chaque étape effectuée vers la sortie ou pour avoir pris un virage correct. Cela aide l'agent à apprendre la tâche plus rapidement et efficacement, car il reçoit un retour en continu tout au long du processus.

Il existe plusieurs types de récompenses de façonnage, y compris celles basées sur le potentiel la modulation des récompenses, where the rewards are based on a potential function that predicts future rewards. The key is that these shaping rewards should not alter the politique optimale of the agent; they should only guide it towards learning that policy plus efficacement.

Overall, shaping reward is an essential technique in designing reinforcement learning systems, as it helps enhance learning speed, improves agent performance, and enables the handling of more complex tasks.