ani17's comments

ani17 · 2026-04-20T13:41:28 1776692488

Author here. I wanted to understand what vLLM and llama.cpp are actually doing under the hood, but the codebases are massive. So I wrote a stripped down version from scratch to see the core ideas without the production complexity.

Code: https://github.com/Anirudh171202/WhiteLotus

ani17 · 2026-04-08T13:05:43 1775653543

Author here. A bit more context: By day I'm a systems engineer building AI networking infrastructure. So I kept ending up in conversations where I'm not exactly able to wrap my head on the latest inference magic trick.

Like when someone mentioned vLLM's paged attention, I knew virtual memory paging, but had no idea someone had applied the same idea to KV cache allocation on GPUs.

Github link to the project: https://github.com/Anirudh171202/WhiteLotus

ani17 · 2026-04-08T13:08:09 1775653689

The blog walks through why your first token is always the slowest, why output tokens cost 5x more, and how stuff like speculative decoding and chunked prefill actually work, from the perspective of a systems engineer!

ani17 · 2025-11-13T23:29:46 1763076586

Definitely an alternative solution. For the purpose of this script, I wouldn't prefer that though.

ani17 · 2025-11-12T17:11:45 1762967505

It's insane if the data is accurate. Only time will tell

ani17 · 2025-09-22T14:01:34 1758549694

You forgot "Middle Out" by Pied Piper!

ani17 · 2025-09-22T00:12:06 1758499926

thanks for sharing!