AI Glossary: What Is Shaping Reward (SR)? Definition & Meaning

Gestaltung Belohnung is a concept in Verstärkungslernen (RL) that involves modifying the reward structure provided to an agent in order to promote desired behaviors and improve learning efficiency. Rather than offering a single reward for achieving a specific goal, shaping reward techniques break down complex tasks into manageable components, rewarding incremental progress towards the final objective.

In traditional reinforcement learning, an agent receives a reward signal only when it reaches a goal state. However, this can lead to inefficiencies, especially in environments with long or complex sequences of actions required to achieve that goal. Shaping reward addresses this issue by providing intermediate rewards based on the agent’s actions or states that are closer to the final goal.

Zum Beispiel könnte ein Roboter beim Training, ein Labyrinth zu durchqueren, nicht nur belohnt werden, wenn er das Labyrinth verlässt, sondern auch kleine Belohnungen für jeden Schritt, der in Richtung Ausgang gemacht wird, oder für eine korrekte Kurve. Dies hilft dem Agenten, die Aufgabe schneller und effektiver zu erlernen, da er kontinuierliches Feedback während des gesamten Prozesses erhält.

Es gibt verschiedene Arten von Shaping-Belohnungen, einschließlich potenzialbasierter Belohnungsformung, where the rewards are based on a potential function that predicts future rewards. The key is that these shaping rewards should not alter the optimale Politik of the agent; they should only guide it towards learning that policy effizienter.

Overall, shaping reward is an essential technique in designing reinforcement learning systems, as it helps enhance learning speed, improves agent performance, and enables the handling of more complex tasks.