AI Glossary: What Is Finite Markov Decision Process (MDP)? Definition & Meaning

Ein Endlicher Markov-Entscheidungsprozess (MDP) is a structured representation used in Entscheidungstheorie and Verstärkungslernen to model scenarios where outcomes depend on both the decisions made by an agent and the stochastic nature of the environment. An MDP is defined by the following components:

Zustände (S): A finite set of states that represent all possible situations the agent can encounter.
Aktionen (A): A finite set of actions available to the agent, which can influence the transition from one state zu einem anderen.
Übergang Wahrscheinlichkeit (P): A function that defines the probability of moving from one state to another given a specific action. This is often denoted as P(s’|s,a), the probability of reaching state s’ from state s by taking action a.
Belohnungen (R): A Belohnungsfunktion that assigns a numerical reward to each state or state-action pair, guiding the agent towards desirable outcomes.
Diskontierungsfaktor (γ): A factor between 0 and 1 that determines the present value of future rewards, allowing the agent to weigh immediate rewards more heavily than those received later.

MDPs werden in verschiedenen Bereichen häufig eingesetzt, darunter künstliche Intelligenz, robotics, economics, and operations research, as they provide a formal way to model sequential decision-making problems. The goal in an MDP is to find a policy—a mapping from states to actions—that maximizes the expected cumulative reward over time. Solving an MDP typically involves algorithms such as Value Iteration or Policy Iteration, which help identify the optimal policy for the agent.