Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

checkout https://github.com/mindee/doctr or https://github.com/VikParuchuri/surya for something practical

multimodal llm would of course blow it all out the water, so some llama3-like model is probably SOTA in terms of what you can run yourself. something like https://huggingface.co/blog/idefics2



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: