Categories

reward model exploitation

Exploitation (in reinforcement learning)

Exploitation is the use of the best-known action based on current knowledge. In reinforcement learning, it means that an agent chooses the option that currently seems most rewarding, instead of ...

Exploitation is the use of the best-known action based on current knowledge. In reinforcement learning, it means that an agent chooses the option that currently seems most rewarding, instead of Read article

Reward hacking

Reward hacking is a situation where a model learns to optimise the reward signal while missing the real purpose of the task. The system technically does what it is rewarded ...

Reward hacking is a situation where a model learns to optimise the reward signal while missing the real purpose of the task. The system technically does what it is rewarded Read article