Coming up with a reward model seems to be really easy though. Every decidable pr... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		imtringued on Feb 2, 2025 \| parent \| context \| favorite \| on: Recent results show that LLMs struggle with compos... Coming up with a reward model seems to be really easy though. Every decidable problem can be used as reward model. The only downside to this is that the LLM community has developed a severe disdain for making LLMs perform anything that can be verified by a classical algorithm. Only the most random data from the internet will do!

marxplank on Feb 2, 2025 [–]

that would help with decidable problems but would still be not generalisable for problems with non trivial rewards, or ones with none.

astrange on Feb 3, 2025 | [–]

Reasoning seems to generalize, insofar as o1 and DeepSeek-R1 are better at answering questions than their base models.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact