Yes there's a lot of cache. But rather than try to have a bunch of cores reading each cache (sharing 96MB L3 for AMD's consumer cores), now there's a lot of separate 36MB L2 caches.
(And yes, then again, some fancy protocols to create a virtual L3 cache from these L2 caches. But less cache heirarchy & more like networking. It still seems beautifully simpler in many ways to me!)
L3 caches are already basically distributed and networked through a ring bus or other NoC on many x86 chips, for example sapphire rapids has 1.875MB of L3 per core, which is pooled into a single coherent L3. Fun fact, this is smaller than each cores L2 (2MB).
From
https://chipsandcheese.com/2023/03/12/a-peek-at-sapphire-rap...
“ the chip appears to be set up to expose all four chiplets as a monolithic entity, with a single large L3 instance. Interconnect optimization gets harder when you have to connect more nodes, and SPR is a showcase of this. Intel’s mesh has to connect 56 cores with 56 L3 slices.”