It feels to me like properly instrumented, these diffusion models are going to be really powerful coding tools. Imagine a “smart” model carving out a certain number of tokens in a response for each category of response output, then diffusing the categories.