I feel this is created for RAG. I tried a document [0] that I tested with OCR; it got all the table values correctly, but the page's footer was missing.
Headers and footers are a real pain with RAG applications, as they are not required, and most OCR or PDF parsers will return them, and there is extract work to do to remove them.
Headers and footers are a real pain with RAG applications, as they are not required, and most OCR or PDF parsers will return them, and there is extract work to do to remove them.
[0] https://github.com/orasik/parsevision/blob/main/example/Mult...