Hacker News new | past | comments | ask | show | jobs | submit login

How much of that is because the models are optimizing specifically for SWE bench?



not that much because its getting better at all benchmarks




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: