Sem Modelo Aprendizado por Reforço (MFRL) is a branch of aprendizado de máquina that focuses on training an agent to make decisions by learning from its experiences in an environment, rather than relying on a predefined model of that environment. In contrast to model-based approaches, where the agent creates a model to predict outcomes of its actions, model-free methods allow the agent to directly learn the best actions to take based on the rewards it receives.
No MFRL, o processo de aprendizagem geralmente envolve dois componentes principais: exploration and exploitation. Exploration refers to the agent trying out different actions to discover their effects, while exploitation involves the agent choosing the actions it believes will yield the highest reward based on its current knowledge. Balancing these two aspects is crucial for effective learning.
Dois tipos populares de aprendizado por reforço sem modelo algorithms são:
- Métodos baseados em valor: These methods estimate the value of being in a given state or taking a particular action. A well-known example is Aprendizado Q, where the agent learns the value of actions in various states to eventually select the best action.
- Métodos baseados em política: Instead of estimating values, these methods directly learn a policy that specifies the best action to take in each state. An example is the REINFORCE algorithm, which uses gradients to optimize the policy.
Model-free reinforcement learning has been successfully applied in various domains, including robotics, game playing, and sistemas autônomos, showcasing its ability to learn complex tasks through trial and error.