Seems very complex therefore very expensive (and possibly slow where it matters,...

jauntywundrkind · 2024-09-08T21:20:48 1725830448

On the contrary!

Yes there's a lot of cache. But rather than try to have a bunch of cores reading each cache (sharing 96MB L3 for AMD's consumer cores), now there's a lot of separate 36MB L2 caches.

(And yes, then again, some fancy protocols to create a virtual L3 cache from these L2 caches. But less cache heirarchy & more like networking. It still seems beautifully simpler in many ways to me!)

buildbot · 2024-09-09T07:48:25 1725868105

L3 caches are already basically distributed and networked through a ring bus or other NoC on many x86 chips, for example sapphire rapids has 1.875MB of L3 per core, which is pooled into a single coherent L3. Fun fact, this is smaller than each cores L2 (2MB).

From https://chipsandcheese.com/2023/03/12/a-peek-at-sapphire-rap... “ the chip appears to be set up to expose all four chiplets as a monolithic entity, with a single large L3 instance. Interconnect optimization gets harder when you have to connect more nodes, and SPR is a showcase of this. Intel’s mesh has to connect 56 cores with 56 L3 slices.”