For those interested, try LLMWhisperer(https://unstract.com/llmwhisperer/) for OCR. It avoids LLMs, eliminates hallucination issues, and preserves the input document layout for better context.
The tool doesn't use any LLMs for processing/parsing the data. It parses and converts into raw text.
The final output(raw text) of the parsing is then fed to LLMs for data extraction.
e.g. Extracting data from insurance, banking, and invoice documents.
For example, Llamaparse(https://docs.llamaindex.ai/en/stable/llama_cloud/llama_parse...) uses LLMs for PDF text extraction but faces hallucination problems. See this issue for more details: https://github.com/run-llama/llama_parse/issues/420.
For those interested, try LLMWhisperer(https://unstract.com/llmwhisperer/) for OCR. It avoids LLMs, eliminates hallucination issues, and preserves the input document layout for better context.
Examples of extracting complex layout:
https://imgur.com/a/YQMkLpA
https://imgur.com/a/NlZOrtX
https://imgur.com/a/htIm6cf