Hacker News new | past | comments | ask | show | jobs | submit login

I'm pretty sure anyone finetuning Lllama now on a regular basis is using https://github.com/unslothai/unsloth so comparisons should be against that. The open source version is ~2x faster than default implementations. NVidia only, although the kernels are in Triton so might be portable.



I remember seeing them on HN when the first started! I never understood what’s the price you pay, how did they get such a big speed up and less memory usage?


There's previous comments, apparently the founder did a lot of math re-deriving things from scratch :)

https://news.ycombinator.com/item?id=39672070

https://unsloth.ai/blog/gemma-bugs


nice work in gemma-bugs -- compared to plenty of research work that is a km deep in real math, this tech note is a just few python tweaks. But finding those and doing it? apparently this is useful and they did it. Easy to read (almost child-like) writeup.. thx for pointing to this.


They main author used to worth Nvidia. There's a free plan, and you can pay to get multiple GPU support.


Indeed, a lora finetune of llama 3.1 8B works on a single 24GB GPU and takes from a few hours to a few days depending on the dataset size.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: