AI Glossary: Reinforcement Learning Terms & Definitions

Action model learning

AML

Action model learning is a method in AI that focuses on predicting the outcomes of actions within a given environment.

Action selection

AS

Action selection is the process by which an AI determines the best action to take in a given situation.

Action Value Function

Q-function

The Action Value Function evaluates the expected reward for taking a specific action in a given state in reinforcement learning.

Actor-Critic

AC

Actor-Critic is a reinforcement learning approach combining policy and value function methods.

Agent Environment Interaction

AEI

The interaction between an AI agent and its environment, influencing decision-making and learning.

AlphaStar

AS

AlphaStar is an AI developed by DeepMind to play StarCraft II at a professional level, showcasing advanced reinforcement learning techniques.

Batch RL

Batch Reinforcement Learning (Batch RL) is a method where an agent learns from a fixed dataset of experiences.

Boltzmann Exploration

Boltzmann Exploration is a method for balancing exploration and exploitation in AI, particularly in reinforcement learning.

Combinatorial Bandit

CB

A combinatorial bandit is a type of algorithm that helps make decisions when multiple options are available simultaneously.

Contextual Bandit

CB

A contextual bandit is a machine learning model that makes decisions based on contextual information to maximize rewards.

Continuous Action Space

A continuous action space allows AI to select from an infinite range of possible actions in decision-making tasks.

Credit Assignment Problem

CAP

The Credit Assignment Problem in AI refers to the challenge of determining which actions are responsible for an outcome.

Critic Agent

CA

A Critic Agent evaluates the performance of an AI model by providing feedback on its decisions.

Cumulative Reward

Cumulative reward is the total reward an agent receives over time in reinforcement learning.

Deep Deterministic Policy Gradient

DDPG

Deep Deterministic Policy Gradient is an algorithm used in reinforcement learning for continuous action spaces.

Deep Q-Learning

DQL

Deep Q-Learning is a reinforcement learning algorithm that combines Q-learning with deep neural networks to optimize decision-making.

Deep Q-Network

DQN

Deep Q-Network is a type of AI that learns to make decisions by combining deep learning with Q-learning.

Dense Reward

DR

A dense reward provides frequent feedback in reinforcement learning, aiding faster learning and improved performance.

Deterministic Policy

A deterministic policy in AI defines a specific action for each state in a given environment.

Deterministic Policy Gradient

DPG

A method in reinforcement learning that optimizes policies using gradients for continuous action spaces.

Discrete Action Space

A discrete action space restricts an AI to a finite set of actions.

Distributional Reinforcement Learning

DRL

Distributional Reinforcement Learning focuses on learning the distribution of future rewards rather than just expected values.

Distributional RL

DRL

Distributional Reinforcement Learning focuses on predicting the full distribution of possible future rewards, rather than just their expected value.

Domain Randomization

DR

Domain Randomization is a technique used in AI to improve the robustness of models by varying training environments.

Double Deep Q-Network

DDQN

A Double Deep Q-Network (DDQN) is an advanced reinforcement learning model that improves stability and performance in decision-making tasks.

Double Q-Learning

DQL

Double Q-Learning is an enhancement of Q-Learning that reduces overestimation bias in value function estimates.

DQN Replay Buffer

Replay Buffer

A DQN Replay Buffer stores experiences to improve learning efficiency in deep reinforcement learning.

Dueling Q-Network

DQN

Dueling Q-Networks improve reinforcement learning via parallel action-value estimations.