This prompts an "old guy anecdote"; forgive me. When I was much younger, I got t...

worthless-trash · 2025-03-06T02:09:48 1741226988

That is a great story. Please never hesitate to drop these in.

Do you have a blog?

musicale · 2025-03-06T05:49:47 1741240187

> so you had to have lots of operand re-use to not be memory-bound

Looking at Nvidia's spec sheet, an H100 SXM can do 989 tf32 teraflops (or 67 non-tensor core fp32 teraflops?) and 3.35 TB/s memory (HBM) bandwidth, so ... similar problem?

pklausler · 2025-03-06T06:29:40 1741242580

There is caching today.

ryao · 2025-03-06T14:52:09 1741272729

The cache hitrate is effectively 0 for LLMs since the datasets are so huge.