AI Glossary: What Is Contextual Bandit (CB)? Definition & Meaning

Bandit Contextual

Um contextual bandit é um tipo de máquina Destaque-se em streaming e that addresses decision-making problems where an agent must choose from a set of actions based on the context it observes. The key feature of contextual bandits is that they incorporate additional information (context) about the environment ou situação em seu processo de tomada de decisão.

Em um problema típico de bandido, o agente enfrenta um dilema: pode explorar novas ações para descobrir suas recompensas potenciais ou explorar ações conhecidas que anteriormente proporcionaram bons resultados. Os bandidos contextuais estendem esse framework considerando informações contextuais, como características do usuário, variáveis ambientais ou interações anteriores, para tomar decisões mais informadas.

Por exemplo, em uma plataforma online sistema de recomendação, a contextual bandit might recommend different products to users based on their browsing history, demographics, or preferences. The algorithm learns which recommendations yield the highest engagement or sales, adapting its strategy over time to maximize overall rewards.

The learning process in contextual bandits often involves balancing exploration (trying new actions) and exploitation (using the best-known actions). Techniques like epsilon-greedy, UCB (Limite Superior de Confiança), e Thompson Sampling são comumente usados para gerenciar essa compensação.

Contextual bandits are widely applied in various fields, including online advertising, personalized content delivery, testes A/B, and healthcare, where the goal is to optimize decisions based on real-time data and feedback.