Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

https://pdftables.com failed the test file, pretty good but inconsistent interpretation across rows, sometimes it split the cell, sometimes it did not. Tabula failed to detect multi-line rows, after manually changing the table it did do better than pdftables.com on splitting cells. Both failed the non-printable whitespace characters that created garbled outputs in the excel. The other one would take some time to rig up.


You can also try https://docparser.com/.

If nothing works for you and you're comfortable with sharing an example file, you can send it to me and I could take a look.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: