I would say it's more like a core feature of the latest generation of LLM's, whi... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

etaioinshrdlu 5 months ago | parent | context | favorite | on: DeepSeek releases Janus Pro, a text-to-image gener...

I would say it's more like a core feature of the latest generation of LLM's, which can be prompted with images+text, and output images+text, along with possibly audio and video.

dlivingston 5 months ago [–]

I mean, that's multimodality, but fine-grained editing of a previously generated text->image prompt is an entirely distinct thing, no?

etaioinshrdlu 5 months ago | [–]

I'm pretty sure that's still the same multimodal LLM, and considered a form of prompting?

Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact