{wiki=State–action–reward–state–action}

State–action–reward–state–action (SARSA) is an algorithm used in reinforcement learning for training agents to make decisions in environments modeled as Markov Decision Processes (MDPs). SARSA is an on-policy method, meaning that it learns the value of the policy being followed by the agent. The components of SARSA can be broken down as follows: 1. **State (S)**: This represents the current state of the environment in which the agent operates.


 State–action–reward–state–action

ID: state-action-reward-state-action