Hacker News new | past | comments | ask | show | jobs | submit login

Seems very complex therefore very expensive (and possibly slow where it matters, at L2). Or it might just work.



On the contrary!

Yes there's a lot of cache. But rather than try to have a bunch of cores reading each cache (sharing 96MB L3 for AMD's consumer cores), now there's a lot of separate 36MB L2 caches.

(And yes, then again, some fancy protocols to create a virtual L3 cache from these L2 caches. But less cache heirarchy & more like networking. It still seems beautifully simpler in many ways to me!)


L3 caches are already basically distributed and networked through a ring bus or other NoC on many x86 chips, for example sapphire rapids has 1.875MB of L3 per core, which is pooled into a single coherent L3. Fun fact, this is smaller than each cores L2 (2MB).

From https://chipsandcheese.com/2023/03/12/a-peek-at-sapphire-rap... “ the chip appears to be set up to expose all four chiplets as a monolithic entity, with a single large L3 instance. Interconnect optimization gets harder when you have to connect more nodes, and SPR is a showcase of this. Intel’s mesh has to connect 56 cores with 56 L3 slices.”




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: