A bit of a tangent, but aren’t CNNs still dominating over ViTs among computer vi...

menaerus · 2025-03-07T07:15:27 1741331727

I haven't watched that space very closely but IMO ViTs have a great potential to extract from since in comparison to CNNs they allow the model to learn and understand complex relations in the data. Where this matters, I expect it to matter a lot. OCR I think is not the greatest such example - while it matters to understand the surrounding context, I think it's not that critical for performance.