{wiki=Thompson_sampling}

Thompson Sampling is a probabilistic method used in the field of machine learning and statistics, particularly in the context of multi-armed bandit problems. The multi-armed bandit problem is a scenario where a decision-maker must choose between multiple options (or "arms") that provide uncertain rewards over time. The goal is to maximize the total reward by balancing exploration (trying out different arms) and exploitation (choosing the arm that seems to provide the highest reward based on past experience).


 Thompson sampling

ID: thompson-sampling