Redes Q-Dupla (DQN) são uma versão avançada architecture used in aprendizado por reforço to enhance the efficiency of learning policies. Unlike traditional Q-Networks, which estimate the value of each action directly, Dueling Q-Networks utilize a two-stream architecture: one stream estimates the state value, and the other estimates the advantage for each action. This separation allows the algorithm to better understand the value of being in a particular state, independent of the actions being taken.
The design of Dueling Q-Networks helps to stabilize and improve performance in environments with a large action space or where the reward signals are sparse. By decoupling the value of states from the values of actions, DQNs can more effectively prioritize exploration de aprendizado, levando a uma convergência mais rápida em políticas ótimas.
In practical terms, during the training process, the DQN learns to evaluate the state through the value stream while assessing the relative advantages of actions through the advantage stream. The final Q-value is computed by combining these two streams, which leads to more informed decision-making in complex environments. This architecture has demonstrated significant improvements in various tasks, particularly in games, where it has been successfully applied to learn strategies in Atari games, outperforming previous methods.