Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> CLIP embeds the entire image as a single vector, not 170 of them.

Single token?

> GPT-4o must be using a different, more advanced strategy internally

Why



The embeddings do not offer the level of fidelity to recognize fine details on an image, hand writing for example.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: