Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm guessing it's a bit different since MLX/MPS doesn't have native 4-bit support (or even 8 if I remember correctly?) It didn't launch with bf16 support even. So I think the lowest you could go on the old type_k/v solution and apple GPUs was 16-bit f16/bf16 but not a llama.cpp internals expert so maybe wrong?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: