I'm guessing it's a bit different since MLX/MPS doesn't have native 4-bit suppor...

I'm guessing it's a bit different since MLX/MPS doesn't have native 4-bit support (or even 8 if I remember correctly?) It didn't launch with bf16 support even. So I think the lowest you could go on the old type_k/v solution and apple GPUs was 16-bit f16/bf16 but not a llama.cpp internals expert so maybe wrong?