Fine-tuning is pretty cheap compared to the original training run - perhaps just...

netruk44 · on Nov 18, 2022

I know a person who fine-tuned stable diffusion, and he said it took 2 weeks of 8xA100 80 GB training time, costing him somewhere between $500-$700 (he got a pretty big discount, too, at today's prices for peer GPU rental it would be over $1,000).

Sure, it's peanuts compared to what it must have cost to train stable diffusion from scratch. However, I think most normal people would not consider spending $500 to fine-tune one of these.

Edit: Though I do agree that once this kind of filtering is in place during training, NSFW models will begin to pop up all over the place.

cookingrobot · on Nov 18, 2022

You can fine tune stable diffusion for $10 using this service: https://www.strmr.com/

It works super well for putting yourself in the images, the likeness is fantastic.

It’s obviously a small training process, they only take 20 images, but it works.

minimaxir · on Nov 18, 2022

For spot-finetuning with Dreambooth (not as good as full-finetuning but can get a specific subject/style much faster), it can be done with about $0.08 of GPU compute, although optimizing it is harder.

https://huggingface.co/docs/diffusers/training/dreambooth

netruk44 · on Nov 18, 2022

Are these services using textual-inversion? If so, I have to wonder how well they would work on a stable diffusion model that was trained with the filter in place from the start, so that it couldn't generate anything close to the filter.

As it is right now, stable diffusion can generate adult imagery by itself, however it seems like it's been fine-tuned after the fact to try to 'cover up' that fact as much as they could before releasing the model publicly.

seaal · on Nov 18, 2022

I believe the safety filter is trivial to disable since it was added in one of the last commits prior to Stable Diffusion’s public release and not baked into the model, therefore most forks just remove the safety checker code [1]

As far as textual inversion, JoePenna’s Dreambooth [2] implementation uses Textual Inversion.

[1] https://github.com/CompVis/stable-diffusion/commit/a6e2f3b12... [2] https://github.com/JoePenna/Dreambooth-Stable-Diffusion

gpderetta · on Nov 18, 2022

As far as I understand textual inversion != Dreambooth != Actual fine-tuning