Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Your concern about catastrophic forgetting is mostly unfounded in the regime of fine-tuning large diffusion models. The weights in this case will maybe suffer from some damage to accuracy on some downstream tasks. In general though, it is not “catastrophic”. I believe this is due to the attention mechanism but I’m happy to be corrected.


I see, it was probably my high learning rate that caused problems. To be honest, I got a bit lazy to retry full finetuning since LoRA worked so well, but maybe I'll revisit this in the future, maybe with Qwen Image.


Perhaps what you were dealing with was actually exploding gradients using fp16 training which _are_ prone to corrupting a model and this can depend on the learning rate.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: