Here are two papers linked from Inception's site: 1. Discrete Diffusion Modeling...

AlexCoventry · 2025-05-01T00:27:33 1746059253

Thanks, yes, I was thinking specifically of "Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution". They actually consider two noise distributions: one with uniform sampling for each noised token position, and one with a terminal masking (the Q^{uniform} and Q^{absorb}.) However, the terminal-masking system is clearly superior in their benchmarks.

https://arxiv.org/pdf/2310.16834#page=6