Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes, although so far it seems the main advantage of text diffusion models is that they're really, really fast. Iterations reach an asymptote very quickly.



I don’t know which text diffusion models you’re talking about, the latest and greatest is this one: https://arxiv.org/abs/2502.09992 and it’s extremely slow – couple of orders of magnitude slower than a regular LLM, mainly because it does not support KV caching, and requires many full sequence processing steps per token.


I’m not familiar with that paper but it would probably be best to compare speeds with an unoptimized transformer decoder. The Vaswani paper came out 8 years ago so implementations will be pretty highly optimized at this point.

On the other hand if there was a theoretical reason why text diffusion models could never be faster than autoregressive transformers it would be notable.


There’s not enough improvement over regular LLMs to motivate optimization effort. Recall that the original transformer was well received because it was fast and scalable compared to RNNs.


Yeah I guess progressive refinement is limited in quality by how good the first N iterations are that establish the broad outlines.


FWIW i don’t think we’ve seen nearly all the ideas for text diffusion yet — why not ‘jiggle the text around a bit’ when things have stabilized, or add space to fill, or have a separate judging module identify space that needs more tokens? Lots of super interesting possibilities.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: