This is a pretty interesting idea for diffusion, but much less so for llms, where (after prompt ingestion) the entire weights have to be cycled through for basically every word.
This works very good on my pc! I've got as i3-12100f CPU, 16GB RAM and RTX 2060. Running it with `llm --cuda 4` answers very fast, with a lot of hallucination though :|