My understanding is that a larger cache can make SMT more effective, but like us...

gpderetta · 2024-07-28T23:48:16 1722210496

Let's say your workload consists solely in traversing a single linked list. This list fits perfectly in L1.

As an L1 load takes 4 cycles and you can't start the next load untill you completed the previous one, the CPU will stall doing nothing 3/4th of cycles. A 4-way SMT could in principle make use of all the wasted cycles.

Of course no load is even close to purely traversing a linked list, but a lot of non-hpc real world load do spend a lot of time in latency limited sections that can benefit from SMT, so it is not just cache misses.

jmb99 · 2024-07-29T04:55:00 1722228900

> so it is not just cache misses.

Agreed 100%. SMT is waaaay more complex than just cache. I was just trying to illustrate in simple scenarios where increasing cache would and would not be beneficial to SMT.