Hacker News new | past | comments | ask | show | jobs | submit login

What are the barriers to mixed architecture models? Models which could seamlessly pass from autoregressive to diffusion, etc.

Humans can integrate multiple sensory processing centers and multiple modes of thought all at once. It's baked into our training process (life).




The human processing is still autoregressive, but using multiple parallel synchronized streams. There is no problem with such an approach and my best guess is that in the next year we will see many teams training models using such tricks for generating reasoning traces in parallel.

The main concern is taking a single probabilistic stream (eg a book) and comparing autoregressive modelling of it with a diffusive modelling of it.

Regarding mixing diffusion and autoregressive—I was at ICLR last week and this work is probably relevant: https://openreview.net/forum?id=tyEyYT267x


Maybe diffusion for "thoughts" and autoregressive for output :S




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: