Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I would say it's more like a core feature of the latest generation of LLM's, which can be prompted with images+text, and output images+text, along with possibly audio and video.



I mean, that's multimodality, but fine-grained editing of a previously generated text->image prompt is an entirely distinct thing, no?


I'm pretty sure that's still the same multimodal LLM, and considered a form of prompting?




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: