Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
AlexCoventry
4 days ago
|
parent
|
context
|
favorite
| on:
DeepSeekMath-V2: Towards Self-Verifiable Mathemati...
"Process-oriented" verification has been a thing for a while in mathematical reasoning CoT. Google had a paper about it last year [1]. The key term to look for is "Process-reward model." I particularly like RL Tango [2].
[1]
https://arxiv.org/abs/2406.06592
[2]
https://arxiv.org/abs/2505.15034
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search:
[1] https://arxiv.org/abs/2406.06592
[2] https://arxiv.org/abs/2505.15034