Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Fine-tuning is pretty cheap compared to the original training run - perhaps just 1% of the cost.

Totally within reach of a consortium of.... "entertainment specialists".



I know a person who fine-tuned stable diffusion, and he said it took 2 weeks of 8xA100 80 GB training time, costing him somewhere between $500-$700 (he got a pretty big discount, too, at today's prices for peer GPU rental it would be over $1,000).

Sure, it's peanuts compared to what it must have cost to train stable diffusion from scratch. However, I think most normal people would not consider spending $500 to fine-tune one of these.

Edit: Though I do agree that once this kind of filtering is in place during training, NSFW models will begin to pop up all over the place.


You can fine tune stable diffusion for $10 using this service: https://www.strmr.com/

It works super well for putting yourself in the images, the likeness is fantastic.

It’s obviously a small training process, they only take 20 images, but it works.


For spot-finetuning with Dreambooth (not as good as full-finetuning but can get a specific subject/style much faster), it can be done with about $0.08 of GPU compute, although optimizing it is harder.

https://huggingface.co/docs/diffusers/training/dreambooth


Are these services using textual-inversion? If so, I have to wonder how well they would work on a stable diffusion model that was trained with the filter in place from the start, so that it couldn't generate anything close to the filter.

As it is right now, stable diffusion can generate adult imagery by itself, however it seems like it's been fine-tuned after the fact to try to 'cover up' that fact as much as they could before releasing the model publicly.


I believe the safety filter is trivial to disable since it was added in one of the last commits prior to Stable Diffusion’s public release and not baked into the model, therefore most forks just remove the safety checker code [1]

As far as textual inversion, JoePenna’s Dreambooth [2] implementation uses Textual Inversion.

[1] https://github.com/CompVis/stable-diffusion/commit/a6e2f3b12... [2] https://github.com/JoePenna/Dreambooth-Stable-Diffusion


As far as I understand textual inversion != Dreambooth != Actual fine-tuning




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: