AI Glossary: What Is Mult-Armed Bandit (MAB)? Definition & Meaning

The Multi-Armed Bandit problem is a well-known scenario in the field of decision theory and machine learning, representing a situation where a decision-maker must choose between multiple options (or ‘arms’) with uncertain payoffs. The name originates from a hypothetical slot machine with multiple levers, each providing a different probability of winning. The challenge lies in the trade-off between exploration (trying different options to gather information) and exploitation (choosing the option known to yield the best reward).

In practical terms, the Multi-Armed Bandit problem can be applied in various domains, such as online advertising, clinical trials, and recommendation systems. For instance, in digital marketing, a system must decide which ad to display to maximize click-through rates. Each ad represents an ‘arm’ of the bandit, and the goal is to identify the most effective ad over time while also exploring potentially better options.

Formally, the problem can be framed using algorithms that manage the exploration-exploitation dilemma. Popular strategies include:

ε-greedy: With a small probability ε, the algorithm explores a random option; otherwise, it exploits the best-known option.
UCB (Upper Confidence Bound): This method selects options based on their potential payoff, balancing exploration and exploitation dynamically.
Thompson Sampling: A Bayesian approach that samples from the probability distribution of the expected rewards of each arm.

Overall, the Multi-Armed Bandit problem serves as a foundational concept in reinforcement learning and adaptive systems, illustrating the complexities of making optimal choices in uncertain environments.