Reward-Hacking

A new paper proves that reward hacking isn’t a bug — it’s a structural equilibrium that gets worse as agents gain more tools. And beyond a capability threshold, agents may stop gaming the metric and start degrading the metric itself.