Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I built a CLI tool for experimenting with Mistral OCR here: https://simonwillison.net/2025/Mar/7/mistral-ocr/

Honestly, the vibes aren't great. Gemini is a lot more flexible for handling PDFs - you can prompt it to do a bunch of other things - and Mistral OCR appears to hallucinate if it can't correctly read handwriting, a common problem with vision LLM based OCR tools.

The way Mistral OCR handles images within the text is disappointing - it doesn't attempt to interpret them, just extracts them out as binary blobs. A vision LLM can usually do a great job of describing an image, but with Mistral OCR you have to manually run that as a separate step.



Knowing that you have to do that as a separate step adds a whole additional level of complexity too.

For example, if some content has the images and some don't, you need to add whole additional steps to your processing and potentially add hallucinations in.

What are you using for document extraction lately, Simon?


I'm really impressed with Gemini - Gemini 2.0 Pro Exp seems remarkably good at even really complex scrappy documents.


Agreed - I am surprised they did are not using Pixtral to read images as well.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: