> CLIP embeds the entire image as a single vector, not 170 of them. Single token... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		tantalor on June 7, 2024 \| parent \| context \| favorite \| on: How Does GPT-4o Encode Images? > CLIP embeds the entire image as a single vector, not 170 of them. Single token? > GPT-4o must be using a different, more advanced strategy internally Why

freediver on June 7, 2024 [–]

The embeddings do not offer the level of fidelity to recognize fine details on an image, hand writing for example.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact