AI Glossary: What Is Multi-Armed Bandit Problem (MAB)? Definition & Meaning

El Bandido de Múltiples Brazos Problema (MAB) is a classic problem in teoría de la probabilidad and statistics that exemplifies the trade-off between exploration and exploitation. In this scenario, a decision-maker (often referred to as an agent) is faced with multiple options (or ‘arms’), each associated with an unknown probability distribution of rewards. The objective is to maximize the total reward over time by strategically selecting which arm to pull.

The term originates from the analogy of a gambler at a row of slot machines (the ‘bandits’), where each machine has a different payout rate. The challenge lies in determining which machines to play and how often, given that the true payout rates are not known in advance.

At its core, the MAB problem encapsulates the dilemma of exploration (trying out new options to gather more information) versus exploitation (continuing to choose the best-known option based on current knowledge). Various strategies have been developed to tackle this problem, including:

ε-greedy algorithm: This method chooses the best-known arm most of the time, but with a small probability (ε), it explores randomly.
Límite Superior de Confianza (UCB): This approach balances exploration and exploitation by selecting arms based on their potential upper confidence bounds.
Muestreo de Thompson: A Bayesian approach that uses distribuciones de probabilidad para determinar qué brazo jugar en función del rendimiento pasado.

Multi-Armed Bandit algorithms have numerous applications, particularly in fields such as online advertising, clinical trials, and adaptive website optimization, where quick decision-making is crucial. By effectively addressing the trade-off between exploration and exploitation, MAB strategies help optimize outcomes in uncertain environments.