This is cool! With that said for anyone looking to use this in RAG, the downside... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

serjester 4 months ago | parent | context | favorite | on: Mistral OCR

This is cool! With that said for anyone looking to use this in RAG, the downside to specialized models instead of general VLMs is you can't easily tune it to your use specific case. So for example, we use Gemini to add very specific alt text to images in the extracted Markdown. It's also 2 - 3X the cost of Gemini Flash - hopefully the increased performance is significant.

Regardless excited to see more and more competition in the space.

Wrote an article on it: https://www.sergey.fyi/articles/gemini-flash-2-tips

hyuuu 4 months ago [–]

gemini flash is notorious for hallucinating the output of the OCR, be careful with it. For straight forward, semi-structured, low page count (under 5) it should perform well, but the more the context window is stretched the more the output gets more unreliable

Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact