Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

For those interested, I made some 1 bit dynamic quants at https://huggingface.co/unsloth/DeepSeek-R1-0528-GGUF

74% smaller 713GB to 185GB.

Use the magic incantation -ot ".ffn_.*_exps.=CPU" to offload MoE layers to RAM, allowing non MoEs to fit < 24GB VRAM on 16K context! The rest sits in RAM & disk.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: