Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Dang. Super fast and significantly more accurate than google, Claude and others.

Pricing : $1/1000 pages, or per 2k pages if “batched”. I’m not sure what batching means in this case: multiple pdfs? Why not split them to halve the cost?

Anyway this looks great at pdf to markdown.



Batched often means a higher latency option (minutes/hours instead of seconds), which providers can schedule more efficiently on their GPUs.


Batching likely means the response is not real-time. You set up a batch job and they send you the results later.


If only business people I work with would understand 100GB even transfer over the network is not going to return immediately results ;)


That makes sense. Idle time is nearly free after all.


From my testing so far, it seems it's super fast and responded synchronously. But it decided that the entire page is an image and returned `![img-0.jpeg](img-0.jpeg)` with coordinates in the metadata for the image, which is the entire page.

Our tool, doctly.ai is much slower and async, but much more accurate and gets you the content itself as an markdown.


I thought we stopped -ly company names ~8 years ago?


Haha for sure. Naming isn't just the hardest problem in computer science, it's always hard. But at some point you just have to pick something and move forward.


if you talk to people gen-x and older, you still need .com domains

for all those people that aren't just clicking on a link on their social media feed, chat group, or targeted ad


But doctr.ai was taken.


Usually (With OpenAI, I haven't checked Mistral yet) it means an async api rather than a sync api.

e.g. you submit multiple requests (pdfs) in one call, and get back an id for the batch. You then can check on the status of that batch and get the results for everything when done.

It lets them use their available hardware to it's full capacity much better.


May I ask as a layperson, how would you about using this to OCR multiple hundreds of pages? I tried the chat but it pretty much stops after the 2nd page.


You can check the example code on the Mistral documentation, you would _only_ have to change the value of the variable `document_url` to the URL of your uploaded PDF... and you need to change the `MISTRAL_API_KEY` to the value of your specific key that you can get from the Le Platforme webpage.

https://docs.mistral.ai/capabilities/document/#ocr-with-pdf


Thanks!


Submit the pages via the API.


This worked indeed. Although I had to cut my document into smaller chunks. 900 pages at once ended with a timeout.


I would assume this is 1 request containing 2k pages vs N requests whose total pages add up to 1000.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: