Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> if you could do this automatically, it would be game changer as you could run top 5 best models in parallel and select best answer every time

remember they have access to the RLHF reward model, against which they can evaluate all N outputs and have the most "rewarded" answer picked and sent



Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: