AI Glossary: What Is Linear Bandit (LB)? Definition & Meaning

Lineares Bandit

Ein linearer Bandit ist ein spezielles Problem im Bereich der Verstärkungslernen and multi-armed bandits, where an agent must choose between a set of actions (or arms) to maximize its cumulative rewards. In a linear bandit setting, the expected reward for each action is modeled as a linear function of underlying features associated with the action.

More formally, each action is represented by a feature vector, and the reward for choosing an action is determined by the inner product of this feature vector and a linear Parametervektor that represents the agent’s preferences or beliefs about the actions. This relationship can be expressed as:

R(a) = θ · x(a)

wobei R(a) die erwartete Belohnung für Aktion a ist, θ der Parametersvektor ist und x(a) der Merkmalsvektor, der mit Aktion a verbunden ist.

The linear bandit model is particularly useful in scenarios where the relationship between features and rewards is approximately linear, allowing for efficient learning and decision-making. The agent learns the optimale Parameter vector θ through exploration (trying different actions) and exploitation (choosing the best-performing actions based on current knowledge).

Linear bandits are commonly applied in various fields such as online advertising, Empfehlungssystemen, and adaptive clinical trials, where the goal is to maximize user engagement or treatment effectiveness based on historical data.

In summary, linear bandits provide a framework for making sequential decisions under uncertainty, leveraging linear relationships to optimize rewards over time.