Hacker Newsnew | past | comments | ask | show | jobs | submit | therealsmith's commentslogin

Yes, I didn't even say anything about how suspicious the rest of it is because maybe I really was missing something and the author would point it out here.

There are numerous red flags. At best it is someone trying to game their Github profile with LLM-generated code, just look at the May 12 activity from that profile.


After looking more, this is definitely an AI driven attempt to game the system.

It’s too bad the comments calling it out earlier were downvoted away. The first one was downvoted until it was flagged.

It’s amazing that this person was able to collect 250 GitHub stars and make bold claims about enhancing llama.cpp when it wasn’t anything new and it didn’t work anyway.


Am I missing something? As far as I can see this patch does nothing except add new options that replicate the functionality of the existing --cache-type-k and --cache-type-v options.

Using `--flash-attn --cache-type-k q8_0 --cache-type-v q8_0` is a very well known optimization to save VRAM.

And it's also very well known that the keys are more sensitive to quantization than values. E.g. https://arxiv.org/abs/2502.15075


> Using `--flash-attn --cache-type-k q8_0 --cache-type-v q8_0`

I think you meant ‘--cache-type-v q4_0’

I would also like an explanation for what’s different in this patch compared to the standard command line arguments.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: