Hacker News new | past | comments | ask | show | jobs | submit login

> it can adjust a whole block of tokens when it encounters some kind of disjunction.

This is true in principle for general diffusion models, but I don't think it's true for the noise model they use in Mercury (at least, going by a couple of academic papers authored by the Inception co-founders.) Their model generates noise by masking a token, and once it's masked, it stays masked. So the reverse-diffusion gets to decide on the contents of a masked token once, and after that it's fixed.




Here are two papers linked from Inception's site:

1. Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution - https://arxiv.org/abs/2310.16834

2. Simple and Effective Masked Diffusion Language Models - https://arxiv.org/abs/2406.07524


Thanks, yes, I was thinking specifically of "Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution". They actually consider two noise distributions: one with uniform sampling for each noised token position, and one with a terminal masking (the Q^{uniform} and Q^{absorb}.) However, the terminal-masking system is clearly superior in their benchmarks.

https://arxiv.org/pdf/2310.16834#page=6


The exact types of path dependencies in inference on text-diffusion models look like an interesting research project.


Yes, the problem is coming up with a noise model where reverse diffusion is tractable.


Thank you, I'll have to read the papers. I don't think I have read theirs.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: