I would recommend give LLMWhisperer a try with the documents pertaining to your ...

ipsum2 · on July 30, 2024

not open source, and OP seems to be the owner.

lumos_maxima93 · on July 30, 2024

it is open-source, the main platform is - Unstract https://github.com/Zipstack/unstract

ipsum2 · on July 30, 2024

Nope, LLMWhisperer to parse PDFs is called through an paid API.

constantinum · on July 31, 2024

I'm not sure why the comment is downvoted! Let me see; the OP did not specifically try/ask for open-source solutions; at least, that is what I read.

Let me break it down!

As one of the commenters mentioned, he/she uses four different tools to parse PDFs to handle common parsing cases — tables, tables with images, OCR, layouts, handwriting, etc.

With LLMwhisperer, you don't need that.

Parsing is just a part of the problem. Engineers still need to figure out what LLM models work/are sufficient, reduce costs(tokens) and performance(parsing a million pages), and make the AI stack production-ready.

LLMWhisperer at least handles most use cases and moves out of your way fast.

Also, LLMwhisperer is not open-source; it's API is charged based on pages parsed.