Yup, surprising results! We were able to dig in a bit more. Main culprit is the overzealous "image extraction". Where if Mistral classifies something as an image, it will replace the entire section with (image)[image_002).
And it happened with a lot of full documents as well. Ex: most receipts got classified as images, and so it didn't extract any text.
And it happened with a lot of full documents as well. Ex: most receipts got classified as images, and so it didn't extract any text.