I don’t know which text diffusion models you’re talking about, the latest and gr...

janalsncm · 2025-03-12T17:23:39 1741800219

I’m not familiar with that paper but it would probably be best to compare speeds with an unoptimized transformer decoder. The Vaswani paper came out 8 years ago so implementations will be pretty highly optimized at this point.

On the other hand if there was a theoretical reason why text diffusion models could never be faster than autoregressive transformers it would be notable.

kadushka · 2025-03-12T22:56:25 1741820185

There’s not enough improvement over regular LLMs to motivate optimization effort. Recall that the original transformer was well received because it was fast and scalable compared to RNNs.