They could become better at humans bc there's a RL training loop? I don't unders...

They could become better at humans bc there's a RL training loop? I don't understand how this isn't directly clear. Even raining purely on human data it's possible to be mildly superhuman (see experiments of chess AI trained on human data only) but once you have verifiable tasks and RL loop the human data is just Kickstarter