L

線形バンディット

LB

線形バンディットは、アクションが特徴量との線形関係に基づいて報酬をもたらすタイプの強化学習問題です。

線形バンディット

線形バンディットは、特定の問題分野における 強化学習 and multi-armed bandits, where an agent must choose between a set of actions (or arms) to maximize its cumulative rewards. In a linear bandit setting, the expected reward for each action is modeled as a linear function of underlying features associated with the action.

More formally, each action is represented by a feature vector, and the reward for choosing an action is determined by the inner product of this feature vector and a linear パラメータベクトル that represents the agent’s preferences or beliefs about the actions. This relationship can be expressed as:

R(a) = θ · x(a)

ここで、R(a)はアクションaの期待報酬、θはパラメータベクトル、x(a)はアクションaに関連付けられた特徴ベクトルです。

The linear bandit model is particularly useful in scenarios where the relationship between features and rewards is approximately linear, allowing for efficient learning and decision-making. The agent learns the 最適なパラメータ vector θ through exploration (trying different actions) and exploitation (choosing the best-performing actions based on current knowledge).

Linear bandits are commonly applied in various fields such as online advertising, レコメンデーションシステム, and adaptive clinical trials, where the goal is to maximize user engagement or treatment effectiveness based on historical data.

In summary, linear bandits provide a framework for making sequential decisions under uncertainty, leveraging linear relationships to optimize rewards over time.

コントロール + /