AI Glossary: What Is Multi-Armed Bandit (MAB)? Definition & Meaning

Das Multi-Armed Bandit (MAB)-Problem ist ein klassisches Dilemma in Wahrscheinlichkeitstheorie and decision-making, commonly encountered in scenarios where an agent must make a series of choices without knowing the potential outcomes in advance. The term originates from the analogy of a gambler playing multiple slot machines (or ‘one-armed bandits’) and needing to decide which machine to play to maximize their winnings.

In a typical MAB setup, there are several options (referred to as ‘arms’), each providing a reward drawn from a probability distribution that is unknown to the player. The player’s objective is to maximize the total reward over a series of trials by dynamically balancing the exploration of less-tried options to discover their potential and the exploitation Optionen, die zuvor hohe Belohnungen erbracht haben.

This problem is particularly relevant in various fields, including online advertising, Empfehlungssystemen, clinical trials, and adaptive routing. The dilemma lies in the trade-off between exploration (trying out different arms to gather more information) and exploitation (choosing the arm that currently has the best-known reward).

Mehrere Algorithmen wurden entwickelt, um die Multi-Armed Bandit-Problem, including epsilon-greedy strategies, Upper Confidence Bound (UCB), and Thompson Sampling. Each of these methods employs different techniques to balance exploration and exploitation, helping to enhance decision-making efficiency while minimizing potential losses.

Overall, the Multi-Armed Bandit is a foundational concept in the field of reinforcement learning and is instrumental in Entscheidungsprozesse zu optimieren in unsicheren Umgebungen.