Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A bit of a tangent, but aren’t CNNs still dominating over ViTs among computer vision competition winners?



I haven't watched that space very closely but IMO ViTs have a great potential to extract from since in comparison to CNNs they allow the model to learn and understand complex relations in the data. Where this matters, I expect it to matter a lot. OCR I think is not the greatest such example - while it matters to understand the surrounding context, I think it's not that critical for performance.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: