AI Glossary: What Is Contextual Bandit (CB)? Definition & Meaning

Kontextueller Bandit

Ein contextual bandit ist eine Art Maschine Lernalgorithmus that addresses decision-making problems where an agent must choose from a set of actions based on the context it observes. The key feature of contextual bandits is that they incorporate additional information (context) about the environment oder Situation in ihren Entscheidungsprozess einbezieht.

Bei einem typischen Bandit-Problem steht der Agent vor einem Dilemma: Er kann entweder neue Aktionen erkunden, um deren potenzielle Belohnungen zu entdecken, oder bekannte Aktionen ausnutzen, die zuvor gute Ergebnisse geliefert haben. Kontextsensitive Bandits erweitern dieses Rahmenwerk, indem sie kontextbezogene Informationen wie Nutzermerkmale, Umweltvariablen oder frühere Interaktionen berücksichtigen, um fundiertere Entscheidungen zu treffen.

Zum Beispiel in einem Online- Empfehlungssystem, a contextual bandit might recommend different products to users based on their browsing history, demographics, or preferences. The algorithm learns which recommendations yield the highest engagement or sales, adapting its strategy over time to maximize overall rewards.

The learning process in contextual bandits often involves balancing exploration (trying new actions) and exploitation (using the best-known actions). Techniques like epsilon-greedy, UCB (Oberes Konfidenzintervall), und Thompson Sampling werden häufig verwendet, um diesen Kompromiss zu steuern.

Contextual bandits are widely applied in various fields, including online advertising, personalized content delivery, A/B-Tests, and healthcare, where the goal is to optimize decisions based on real-time data and feedback.