F

Endlicher Markov-Entscheidungsprozess

MDP

Ein Finite Markov Decision Process ist ein mathematischer Rahmen zur Modellierung von Entscheidungsprozessen, bei denen die Ergebnisse teilweise zufällig sind.

Ein Endlicher Markov-Entscheidungsprozess (MDP) is a structured representation used in Entscheidungstheorie and Verstärkungslernen to model scenarios where outcomes depend on both the decisions made by an agent and the stochastic nature of the environment. An MDP is defined by the following components:

  • Zustände (S): A finite set of states that represent all possible situations the agent can encounter.
  • Aktionen (A): A finite set of actions available to the agent, which can influence the transition from one state to another.
  • Transition Wahrscheinlichkeit (P): A function that defines the probability of moving from one state to another given a specific action. This is often denoted as P(s’|s,a), the probability of reaching state s’ from state s by taking action a.
  • Belohnungen (R): A Belohnungsfunktion that assigns a numerical reward to each state or state-action pair, guiding the agent towards desirable outcomes.
  • Diskontierungsfaktor (γ): A factor between 0 and 1 that determines the present value of future rewards, allowing the agent to weigh immediate rewards more heavily than those received later.

MDPs are widely used in various fields, including künstliche Intelligenz, robotics, economics, and operations research, as they provide a formal way to model sequential decision-making problems. The goal in an MDP is to find a policy—a mapping from states to actions—that maximizes the expected cumulative reward over time. Solving an MDP typically involves algorithms such as Value Iteration or Policy Iteration, which help identify the optimal policy for the agent.

Strg + /