AI Glossary: What Is Contextual Bandit (CB)? Definition & Meaning

Bandido Contextual

Un bandido contextual es un tipo de máquina para creación de videos that addresses decision-making problems where an agent must choose from a set of actions based on the context it observes. The key feature of contextual bandits is that they incorporate additional information (context) about the environment o situación en su proceso de toma de decisiones.

En un problema típico de bandido, el agente enfrenta un dilema: puede explorar nuevas acciones para descubrir sus posibles recompensas o explotar acciones conocidas que previamente han dado buenos resultados. Los bandidos contextuales amplían este marco considerando información contextual, como características del usuario, variables ambientales o interacciones previas, para tomar decisiones más informadas.

Por ejemplo, en un en línea sistema de recomendación, a contextual bandit might recommend different products to users based on their browsing history, demographics, or preferences. The algorithm learns which recommendations yield the highest engagement or sales, adapting its strategy over time to maximize overall rewards.

The learning process in contextual bandits often involves balancing exploration (trying new actions) and exploitation (using the best-known actions). Techniques like epsilon-greedy, UCB (Límite Superior de Confianza), y Thompson Sampling se utilizan comúnmente para gestionar este compromiso.

Contextual bandits are widely applied in various fields, including online advertising, personalized content delivery, Pruebas A/B, and healthcare, where the goal is to optimize decisions based on real-time data and feedback.