Categories

reward overfitting

Reward hacking

Reward hacking is a situation where a model learns to optimise the reward signal while missing the real purpose of the task. The system technically does what it is rewarded ...

Reward hacking is a situation where a model learns to optimise the reward signal while missing the real purpose of the task. The system technically does what it is rewarded Read article

Overfitting

Overfitting is a situation where a machine learning model learns the training data too closely and performs poorly on new data. The model may look very accurate during training, but ...

Overfitting is a situation where a machine learning model learns the training data too closely and performs poorly on new data. The model may look very accurate during training, but Read article