Categories

reward misspecification AI

Reward hacking

Reward hacking is a situation where a model learns to optimise the reward signal while missing the real purpose of the task. The system technically does what it is rewarded ...

Reward hacking is a situation where a model learns to optimise the reward signal while missing the real purpose of the task. The system technically does what it is rewarded Read article

Reward

Reward is one of the core concepts in reinforcement learning. It is a numerical signal that an agent receives after taking an action in a certain situation. This signal tells ...

Reward is one of the core concepts in reinforcement learning. It is a numerical signal that an agent receives after taking an action in a certain situation. This signal tells Read article