Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yup, surprising results! We were able to dig in a bit more. Main culprit is the overzealous "image extraction". Where if Mistral classifies something as an image, it will replace the entire section with (image)[image_002).

And it happened with a lot of full documents as well. Ex: most receipts got classified as images, and so it didn't extract any text.



This sounds like a real problem and hurdle for North American (US/CAN in particular) invoice and receipt processing?


where do you find this regarding "Where if Mistral classifies something as an image, it will replace the entire section with (image)[image_002)."?


themanmaran works at Omni so presumably they have access to the actual resulting data from this study




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: