Dang. Super fast and significantly more accurate than google, Claude and others....

sophiebits · 2025-03-06T18:06:04 1741284364

Batched often means a higher latency option (minutes/hours instead of seconds), which providers can schedule more efficiently on their GPUs.

abiraja · 2025-03-06T18:05:30 1741284330

Batching likely means the response is not real-time. You set up a batch job and they send you the results later.

ozim · 2025-03-06T19:00:37 1741287637

If only business people I work with would understand 100GB even transfer over the network is not going to return immediately results ;)

vessenes · 2025-03-06T18:06:31 1741284391

That makes sense. Idle time is nearly free after all.

kapitalx · 2025-03-06T18:58:48 1741287528

From my testing so far, it seems it's super fast and responded synchronously. But it decided that the entire page is an image and returned `![img-0.jpeg](img-0.jpeg)` with coordinates in the metadata for the image, which is the entire page.

Our tool, doctly.ai is much slower and async, but much more accurate and gets you the content itself as an markdown.

ralusek · 2025-03-06T19:12:45 1741288365

I thought we stopped -ly company names ~8 years ago?

kapitalx · 2025-03-06T19:27:19 1741289239

Haha for sure. Naming isn't just the hardest problem in computer science, it's always hard. But at some point you just have to pick something and move forward.

yieldcrv · 2025-03-06T19:21:35 1741288895

if you talk to people gen-x and older, you still need .com domains

for all those people that aren't just clicking on a link on their social media feed, chat group, or targeted ad

DonHopkins · 2025-03-07T05:07:31 1741324051

But doctr.ai was taken.

Tostino · 2025-03-06T18:08:45 1741284525

Usually (With OpenAI, I haven't checked Mistral yet) it means an async api rather than a sync api.

e.g. you submit multiple requests (pdfs) in one call, and get back an id for the batch. You then can check on the status of that batch and get the results for everything when done.

It lets them use their available hardware to it's full capacity much better.

odiroot · 2025-03-06T18:52:39 1741287159

May I ask as a layperson, how would you about using this to OCR multiple hundreds of pages? I tried the chat but it pretty much stops after the 2nd page.

beklein · 2025-03-06T19:58:15 1741291095

You can check the example code on the Mistral documentation, you would _only_ have to change the value of the variable `document_url` to the URL of your uploaded PDF... and you need to change the `MISTRAL_API_KEY` to the value of your specific key that you can get from the Le Platforme webpage.

https://docs.mistral.ai/capabilities/document/#ocr-with-pdf

odiroot · 2025-03-06T20:17:39 1741292259

Thanks!

sneak · 2025-03-06T18:56:15 1741287375

Submit the pages via the API.

odiroot · 2025-03-06T21:04:47 1741295087

This worked indeed. Although I had to cut my document into smaller chunks. 900 pages at once ended with a timeout.

jacksnipe · 2025-03-06T18:05:25 1741284325

I would assume this is 1 request containing 2k pages vs N requests whose total pages add up to 1000.