> it can adjust a whole block of tokens when it encounters some kind of disjunct...

freeqaz · 2025-05-01T00:00:57 1746057657

Here are two papers linked from Inception's site:

1. Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution - https://arxiv.org/abs/2310.16834

2. Simple and Effective Masked Diffusion Language Models - https://arxiv.org/abs/2406.07524

AlexCoventry · 2025-05-01T00:27:33 1746059253

Thanks, yes, I was thinking specifically of "Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution". They actually consider two noise distributions: one with uniform sampling for each noised token position, and one with a terminal masking (the Q^{uniform} and Q^{absorb}.) However, the terminal-masking system is clearly superior in their benchmarks.

https://arxiv.org/pdf/2310.16834#page=6

macleginn · 2025-04-30T23:39:10 1746056350

The exact types of path dependencies in inference on text-diffusion models look like an interesting research project.

AlexCoventry · 2025-05-01T04:45:49 1746074749

Yes, the problem is coming up with a noise model where reverse diffusion is tractable.

XenophileJKO · 2025-04-30T23:39:35 1746056375

Thank you, I'll have to read the papers. I don't think I have read theirs.