I did my own experiments and it looks like (surprisingly) Q4KM models often outp...

I did my own experiments and it looks like (surprisingly) Q4KM models often outperforms Q6 and Q8 quantised models.

For bigger models (in range of 8B - 70B) the Q4KM is very good, there are no any degradation compared to full FP16 models.