This is a very bad idea for image models. They pick up and amplify imperceptible...

godelski · on June 29, 2023

> Not sure about diffusion models.

Diffusion models are still approximate density estimators, not explicit. They lose information because you don't have an unique mapping to the subsequent step. Got to think about the relationships of your image and preimage.

So while they have better distribution that GANs, they still aren't reliable for dataset synthesis. But they are better than GANs for that (GANs will be very mean focused, which is why we had such high quality images from them but we also see huge diversity issues and amplification of biases).

dragonwriter · on June 29, 2023

> Not sure about diffusion models.

Human-curated synthetic data is commonly used in finetuning (or LoRa-training) for SD. I doubt that uncurated synthetic data would be very usable. There might be use cases where curating synthetic data with some kind of vision model would be valuable, but my intuition would be that it would be largely hit-or-miss and hard to predict.