 Reward modeling (source code)

= Reward modeling

See e.g.: <Human Compatible>
* https://deepmindsafetyresearch.medium.com/scalable-agent-alignment-via-reward-modeling-bf4ab06dfd84

 Back to article page