AI Glossary: What Is Mult-Armed Bandit (MAB)? Definition & Meaning

O Bandido de braços múltiplos problem is a well-known scenario in the field of teoria da decisão and machine learning, representing a situation where a decision-maker must choose between multiple options (or ‘arms’) with uncertain payoffs. The name originates from a hypothetical slot machine with multiple levers, each providing a different probability of winning. The challenge lies in the trade-off between exploration (trying different options to gather information) and exploitation (choosing the option known to yield the best reward).

Em termos práticos, o problema do Bandido de Múltiplos Braços can be applied in various domains, such as online advertising, clinical trials, and recommendation systems. For instance, in digital marketing, a system must decide which ad to display to maximize click-through rates. Each ad represents an ‘arm’ of the bandit, and the goal is to identify the most effective ad over time while also exploring potentially better options.

Formalmente, o problema pode ser estruturado usando algorithms que gerenciam o dilema exploração-exploração. Estratégias populares incluem:

ε-guloso: With a small probability ε, the algorithm explora uma opção aleatória; caso contrário, explora a melhor opção conhecida.
UCB (Limite Superior de Confiança): This method selects options based on their potential payoff, equilibrando exploração e exploração de forma dinâmica.
Amostragem de Thompson: A Bayesian approach that samples from the probability distribution of the expected rewards of each arm.

No geral, o problema do Bandido de Múltiplos Braços serve como um conceito fundamental em aprendizado por reforço and adaptive systems, illustrating the complexities of making optimal choices in uncertain environments.