A normal SRAM cell takes 6 transistors. 6 * 8 * 1024 * 1024 * <total_mb> is a big number.
Next, SRAM doesn't scale like normal transistors. TSMC N7 cells are 0.027 nanometers while MY cells are 0.021 (1.35x). meanwhile, normal transistors got a 1.85x shrink.
I-cache is also different across architectures. x86 uses 15-20% less instruction memory for the same program (on average). This means for the same size cache that x86 can store more code and have a higher hit rate.
The next issue is latency. Larger cache sizes mean larger latencies. AMD and Intel have both used 64kb L1 and then move back to 32kb because of latencies. The fact that x86 chips get such good performance with a fraction of the L1 cache points more to since kind of inefficiency in Apples design. I'd guess AMD/Intel have much better prefetcher designs.
> 6 * 8 * 1024 * 1024 * <total_mb> is a big number.
No, when your chip has many billions of transistors that’s not a big number. For 1 mb that’s about 0.2%, a tiny number, also when multiplied with 1.85.
Next the argument is that x64 chips are better because they have less cache while before the Apple chips couldn’t compete because Intel had more. That doesn’t make sense. And how are you drawing conclusions on the design and performance of a chip that’s not even on the market yet anyway?
With 16MB of cache, that's nearly 1 billion transistors out of 16 billion -- and that's without including all the cache control circuitry.
Maybe I'm misunderstanding, but the 1.85x number does not apply to SRAM.
I've said for a long time that the x86 ISA has a bigger impact on chip design and performance (esp per watt) than Intel or AMD would like to admit. You'll not find an over-the-top fan here.
My point is that x86 can do more with less cache than aarch64. If you're interested, RISC-V with compact instructions enabled (100% of production implementations to my knowledge) is around 15% more dense than x86 and around 30-35% more dense than aarch64.
This cache usage matters because of all the downsides of needing larger cache and because cache density is scaling at a fraction of normal transistors.
Anandtech puts A14 between Intel and AMD for int performance and less than both in float performance. The fact that Intel and AMD fare so well while Apple has over 6x the cache means they're doing something very fancy and efficient to make up the difference (though I'd still hypothesize that if Apple did similar optimizations, it would still wind up being more efficient due to using a better ISA).
I’m sure you are well versed in this matter, definitely better than I am. You don’t do a great job of explaining though, the story is all over the place.
Next, SRAM doesn't scale like normal transistors. TSMC N7 cells are 0.027 nanometers while MY cells are 0.021 (1.35x). meanwhile, normal transistors got a 1.85x shrink.
I-cache is also different across architectures. x86 uses 15-20% less instruction memory for the same program (on average). This means for the same size cache that x86 can store more code and have a higher hit rate.
The next issue is latency. Larger cache sizes mean larger latencies. AMD and Intel have both used 64kb L1 and then move back to 32kb because of latencies. The fact that x86 chips get such good performance with a fraction of the L1 cache points more to since kind of inefficiency in Apples design. I'd guess AMD/Intel have much better prefetcher designs.