State–action–reward–state–action
ID: state-action-reward-state-action
State–action–reward–state–action (SARSA) is an algorithm used in reinforcement learning for training agents to make decisions in environments modeled as Markov Decision Processes (MDPs). SARSA is an on-policy method, meaning that it learns the value of the policy being followed by the agent. The components of SARSA can be broken down as follows: 1. **State (S)**: This represents the current state of the environment in which the agent operates.
New to topics? Read the docs here!