RLHF reward hacking

Reward hacking

Reward hacking is a situation where a model learns to optimise the reward signal while missing the real purpose of the task. The system technically does what it is rewarded ...

Reward hacking is a situation where a model learns to optimise the reward signal while missing the real purpose of the task. The system technically does what it is rewarded Read article →

RLHF

RLHF, short for reinforcement learning from human feedback, is a training approach in which human preferences are used to help shape model behaviour. Instead of telling the model only what ...

RLHF, short for reinforcement learning from human feedback, is a training approach in which human preferences are used to help shape model behaviour. Instead of telling the model only what Read article →

Who am I?

Hi, my name is Michal Krčmář. I am digital entrepreneur currently based in Prague (Czechia). I’ve been working on websites and delivering online marketing campaigns as a commercially-focused freelancer since 2006 for big prestigious brands, worked in-house for many years as consultant, manager or director and run several own startups. I founded/co-founded several companies with annual turnaround around 10+ mio USD. When I am not actually working on my business I try to constantly pass on my broad experience through seminars and online training, I also organize company trainings and workshops.

More about me Contacts