L

Banda lineal

LB

Un banda lineal es un tipo de problema de aprendizaje por refuerzo donde las acciones generan recompensas basadas en una relación lineal con las características.

Banda lineal

Un bandido lineal es un problema específico en el campo de aprendizaje por refuerzo and multi-armed bandits, where an agent must choose between a set of actions (or arms) to maximize its cumulative rewards. In a linear bandit setting, the expected reward for each action is modeled as a linear function of underlying features associated with the action.

More formally, each action is represented by a feature vector, and the reward for choosing an action is determined by the inner product of this feature vector and a linear vector de parámetros that represents the agent’s preferences or beliefs about the actions. This relationship can be expressed as:

R(a) = θ · x(a)

donde R(a) es la recompensa esperada para la acción a, θ es el vector de parámetros, y x(a) es el vector de características asociado con la acción a.

The linear bandit model is particularly useful in scenarios where the relationship between features and rewards is approximately linear, allowing for efficient learning and decision-making. The agent learns the capacidad de parámetros óptima vector θ through exploration (trying different actions) and exploitation (choosing the best-performing actions based on current knowledge).

Linear bandits are commonly applied in various fields such as online advertising, sistemas de recomendación, and adaptive clinical trials, where the goal is to maximize user engagement or treatment effectiveness based on historical data.

In summary, linear bandits provide a framework for making sequential decisions under uncertainty, leveraging linear relationships to optimize rewards over time.

oEmbed (JSON) + /