LLM based OCR is a disaster, great potential for hallucinations and no estimate of confidence. Results might seem promising but you’ll always be wondering.
CNN-based OCR also have "hallucinations" and Transformers aren't that much different in that respect. This is a problem solved with domain specific post-processing.