Relatedly, we find LLM vision models absolutely atrocious at counting things. We build school curricula, and one basic task for our activities is counting – blocks, pictures of ducks, segments in a chart, whatever. Current LLM models can't reliably count four or five squares in an image.