The reward model could be used as a validation/reward for the client. Give the s...

bastawhiz · 2025-05-12T14:21:08 1747059668

Arguably that's worse than crypto proof of work: inference is extremely expensive and you're multiplying every operation by N. Which means the cost is multiplied by N.

And like, what are you doing? You've managed to find a use case where you don't care that you're doing compute on some untrusted servers online (and no, there's no magic AI homomorphic encryption) but at the same time you're willing to accept the latency of doing the work multiple times AND it's probably all low end 4090s doing the work AND you're willing to pay for the wasted compute? I'm here shuddering at the thought of model setup times when one node in a cluster goes down and you're facing that on... well, probably most inferences? If you're not administering the infra, you get the lowest common denominator of performance.

Philpax · 2025-05-12T11:36:34 1747049794

That sounds like it'll lead to human-driven reward hacking [0]?

[0]: https://en.wikipedia.org/wiki/Reward_hacking