AI Glossary: What Is On-Policy Method? Definition & Meaning

On-Policy Method is a term used in the field of reinforcement learning, a subset of artificial intelligence (AI). In reinforcement learning, an agent learns to make decisions by interacting with an environment, receiving feedback in the form of rewards or penalties. The distinguishing feature of on-policy methods is that they learn the value of the policy being executed by the agent, meaning the policy that is actively being followed during the learning process.

In on-policy learning, the agent derives the policy based on the actions it takes in the environment. This means that the agent explores and exploits actions according to its current policy, using the outcomes of these actions to update the policy itself. A common example of an on-policy method is the SARSA (State-Action-Reward-State-Action) algorithm, which updates the action-value function based on the action taken by the current policy. The agent learns from the actions it has taken, which means that any changes to the policy directly affect the learning process.

One of the main advantages of on-policy methods is that they can effectively balance exploration (trying new actions) and exploitation (choosing the best-known actions) since the learning is tightly coupled with the policy being followed. However, this also means that on-policy methods can be less sample efficient compared to off-policy methods, which learn from the experiences generated by different policies. On-policy methods are particularly useful in environments where the agent needs to continuously adapt to changes and learn from its direct experiences.