Yes, although so far it seems the main advantage of text diffusion models is tha...

kadushka · 2025-03-12T15:43:38 1741794218

I don’t know which text diffusion models you’re talking about, the latest and greatest is this one: https://arxiv.org/abs/2502.09992 and it’s extremely slow – couple of orders of magnitude slower than a regular LLM, mainly because it does not support KV caching, and requires many full sequence processing steps per token.

janalsncm · 2025-03-12T17:23:39 1741800219

I’m not familiar with that paper but it would probably be best to compare speeds with an unoptimized transformer decoder. The Vaswani paper came out 8 years ago so implementations will be pretty highly optimized at this point.

On the other hand if there was a theoretical reason why text diffusion models could never be faster than autoregressive transformers it would be notable.

kadushka · 2025-03-12T22:56:25 1741820185

There’s not enough improvement over regular LLMs to motivate optimization effort. Recall that the original transformer was well received because it was fast and scalable compared to RNNs.

lukasb · 2025-03-12T06:39:11 1741761551

Yeah I guess progressive refinement is limited in quality by how good the first N iterations are that establish the broad outlines.

vessenes · 2025-03-12T07:40:25 1741765225

FWIW i don’t think we’ve seen nearly all the ideas for text diffusion yet — why not ‘jiggle the text around a bit’ when things have stabilized, or add space to fill, or have a separate judging module identify space that needs more tokens? Lots of super interesting possibilities.