therealsmith's comments

therealsmith · 2025-05-17T12:24:19 1747484659

Yes, I didn't even say anything about how suspicious the rest of it is because maybe I really was missing something and the author would point it out here.

There are numerous red flags. At best it is someone trying to game their Github profile with LLM-generated code, just look at the May 12 activity from that profile.

Aurornis · 2025-05-17T14:05:28 1747490728

After looking more, this is definitely an AI driven attempt to game the system.

It’s too bad the comments calling it out earlier were downvoted away. The first one was downvoted until it was flagged.

It’s amazing that this person was able to collect 250 GitHub stars and make bold claims about enhancing llama.cpp when it wasn’t anything new and it didn’t work anyway.

therealsmith · 2025-05-16T23:10:40 1747437040

Am I missing something? As far as I can see this patch does nothing except add new options that replicate the functionality of the existing --cache-type-k and --cache-type-v options.

Using `--flash-attn --cache-type-k q8_0 --cache-type-v q8_0` is a very well known optimization to save VRAM.

And it's also very well known that the keys are more sensitive to quantization than values. E.g. https://arxiv.org/abs/2502.15075

Aurornis · 2025-05-17T00:43:52 1747442632

> Using `--flash-attn --cache-type-k q8_0 --cache-type-v q8_0`

I think you meant ‘--cache-type-v q4_0’

I would also like an explanation for what’s different in this patch compared to the standard command line arguments.