Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I feel this is created for RAG. I tried a document [0] that I tested with OCR; it got all the table values correctly, but the page's footer was missing.

Headers and footers are a real pain with RAG applications, as they are not required, and most OCR or PDF parsers will return them, and there is extract work to do to remove them.

[0] https://github.com/orasik/parsevision/blob/main/example/Mult...




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: