Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Relatedly, we find LLM vision models absolutely atrocious at counting things. We build school curricula, and one basic task for our activities is counting – blocks, pictures of ducks, segments in a chart, whatever. Current LLM models can't reliably count four or five squares in an image.


IMHO, that is expected, at least for the general case.

That is one of the implications of transformers being DLOGTIME-uniform TC0, they don't have access to counter analogs.

You would need to move to log depth circuits, add mod-p_n gates etc... unless someone finds some new mathematics.

Proposition 6.14 in Immerman is where this is lost if you want a cite.

It will be counterintuitive that division is in TC0, but (general) counting is not.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: