AI Glossary: What Is Mult-Armed Bandit (MAB)? Definition & Meaning

El Bandido de Múltiples Brazos problem is a well-known scenario in the field of teoría de decisiones and machine learning, representing a situation where a decision-maker must choose between multiple options (or ‘arms’) with uncertain payoffs. The name originates from a hypothetical slot machine with multiple levers, each providing a different probability of winning. The challenge lies in the trade-off between exploration (trying different options to gather information) and exploitation (choosing the option known to yield the best reward).

En términos prácticos, el problema de la Bandido de Múltiples Brazos can be applied in various domains, such as online advertising, clinical trials, and recommendation systems. For instance, in digital marketing, a system must decide which ad to display to maximize click-through rates. Each ad represents an ‘arm’ of the bandit, and the goal is to identify the most effective ad over time while also exploring potentially better options.

Formalmente, el problema puede enmarcarse utilizando algorithms que gestionan el dilema de exploración-explotación. Las estrategias populares incluyen:

ε-greedy: With a small probability ε, the algorithm explora una opción aleatoria; de lo contrario, explota la opción mejor conocida.
UCB (Límite Superior de Confianza): This method selects options based on their potential payoff, equilibrando la exploración y la explotación dinámicamente.
Muestreo de Thompson: A Bayesian approach that samples from the probability distribution of the expected rewards of each arm.

En general, el problema del Multi-Armed Bandit sirve como un concepto fundamental en aprendizaje por refuerzo and adaptive systems, illustrating the complexities of making optimal choices in uncertain environments.