I think you misunderstood the comment. Intel CPUs are already performing well an...

hn_check · on June 23, 2020

The comment was very literal in comparing the cache on the Apple chip to the Intel chip.

And just to be clear, cache is "expensive" in die size. They aren't putting an order in for L2 cache to Samsung or something.

That Apple chip has a die size less than half the size of Intel chips than it outperforms. So the whole "expensive" claim is debunked before it even gets started.

Further we are very explicitly comparing Apple silicon to Intel because that is exactly the transition that's happening here.

ece · on June 23, 2020

The Apple chip doesn't have all big cores and only 6 total cores in the A12, Intel or AMD (CCX) have 8 big cores with SMT. Apple's multithread performance is accordingly slower. Once you account for these two big omissions, you'll likely find Apple take as much or more die area than AMD's Zen2 CCX for similar MT performance.

It'll probably be more die area for equivalent performance, which for Apple might not be an issue given it's margins. Of all the ARM designs we've seen, cache is by far the unique factor in Apple's design, so comparing die size with equivalent cores+features makes complete sense.

Like others have mentioned, maybe Apple will just focus on implementing new instructions, but at that point, they will likely diverge enough from the ARM ecosystem that developers and users should be somewhat worried.

hn_check · on June 24, 2020

Amazing how quickly all of the goalposts are moving so people can desperately try to diminish whatever Apple does. Now it's die size? Or, odder still, die percentage.

Firstly, the A12Z is 8 full cores. The "small" cores aren't limited to a subset of instructions or something, they're made on a more efficient, but lower headroom, tracing. That is a 120mm2 die, versus 197mm2 for the Ryzen 7 3700x (8 cores).

Oh but wait, the 3700x has no integrated graphics, no video encoder/decoder, no 5TFlop neural network, no secure enclave... It's absolutely huge comparatively, and has a tiny fraction of the features.

This whole die size nonsense really isn't turning out, is it?

The 3700X is of course a faster chip (not in single threads, but when all cores are engaged), but that's with active cooling and a 65W+ TDP, versus about 6W for the A12Z. Oh, and it's even a year newer than the Apple chip which is just relevant for a development kit.

Maybe we can prioritize based upon how many "Zen" codenames exist in the product. There the A12Z clearly falters!

ece · on June 24, 2020

The 3700X CCX die has 8-core+36MB L2+L3 cache and is just 74mm2, the IO die has pcie4, ddr4, other IO and is 12nm 125mm2. For a total of 199mm2.

If you want to compare cpu, graphics, video and nn, then the AMD 4800U die size is 156mm2, this chip has +4MB L2+L3 cache (12MB total), a much better GPU+FP16 for 4TFlop nn, and full AVX2+SMT cores more than the A12Z. The little A12 cores might be full ISA, but they're 1/3 the die area and are lower performance. NEON is half the size of AVX2 and the GPU difference alone would likely push the A12Z past 156mm2. And there are 15W/45W versions of this chip going as low as 10W. The A12Z is likely around 10W+ too in the iPad Pro and the devkit, but I can't find sources on this.

Looking a lot more competitive now isn't it?

The Qualcomm 855 is 73mm2, and the A12 is 83mm2, and the performance gains here are impressive. Beyond this, A12Z 120mm2 vs AMD APU 156mm2 and it's starting to look like a much closer fight, and by no means a perf/watt or perf/$ advantage for Apple until we see real systems.

Die size is _the_ trade off Apple is making with their ARM/RISC+loads of L2 cache design. It's a trade off every chip makes, but it's especially important here with large cache sizes. I don't doubt in a couple of generations Apple can compete with an AMD 4800U CPU+GPU on real world multi-threaded tasks at 10W (assuming 15% increases/gen), but the 4800U is already a few months old now. Apple fanboys never learn. Sigh. Also, Apple fanboys are the new Intel fanboys when stressing single thread performance.

Sources: http://www.hw-museum.cz/cpu/414/amd-ryzen-7-3700x https://www.techpowerup.com/264801/amd-renoir-die-shot-pictu... https://www.cpu-monkey.com/en/igpu-amd_radeon_8_graphics_ren... https://en.wikichip.org/wiki/qualcomm/snapdragon_800/855 https://en.wikipedia.org/wiki/Apple_A12

dang · on June 25, 2020

> Apple fanboys never learn. Sigh. Also, Apple fanboys are the new Intel fanboys

Please edit flamebait out of your posts here. It's against the rules for good reason, and it evokes worse from others.

https://news.ycombinator.com/newsguidelines.html

hn_check · on June 24, 2020

"Apple fanboys never learn. Sigh."

Just to be clear, you (and several others running the same playbook) are attacking Apple's entrant from every possible dimension, cherry picking specific micro-traits from various other systems (even if they aren't SoCs and have a tiny fraction of the functionality -- hey, if you can tease a dumb argument out of it...) and turning that into some sort of Voltron combined creation to claim..."victory"? And people impressed with Apple's progress based upon actual reality are the "fanboys"?

Again about cache. To repeat what has already been said, the A12Z doesn't have an L3 cache. The L2 cache is an L3 cache given that it isn't per core.

The A12Z has 8MB of this L2+L3 cache. The 855 has 7.8MB of L2+L3 cache. The 4800U has 12MB of L2+L3 cache. The 3700X has 36MB of L2+L3 cache. So tell me again how the A12Z is somehow hacking the system or cheating? This is an outrageously dumb argument that the, I guess, "AMD fanboys" have all fed each other to run around trying to shit on Apple, and it betrays a complete lack of knowledge -- just copy/pasting some bullshit.

Enough about the stupid cache nonsense because it has no basis in reality.

"Also, Apple fanboys are the new Intel fanboys when stressing single thread performance."

It is the single most important facet of a single-user performance system, or we'd all be using shitty MediaTek NNN-core designs.

And, I mean, the A12Z annihilates the 4800U at single thread performance, and equals it at multithread performance...for a little tablet chip, and despite that 4800U having that mega, super, giant hack of die size cache, and despite it boosting that single core to 4Ghz, versus "just" 2.49Ghz for the A12Z.

Oh, and that Apple core has a 5TFlop neural engine aside from the GPU. Separate hardware encoders/decoders (not as a facet of the GPU). Camera controllers. And on and fucking on.

What Apple has done is very impressive, and I imagine on their desktop/laptop chips they'll be a lot less conservative, likely with all "Big" cores. Maybe they'll even put dedicated L2 cache!

sidenote - you talked about the AMD chip being a "couple of months" old. The A12Z we are talking about is over two years old. You understand that we don't know what Apple is going to drop in their actual production designs, and we are talking about the A12Z because they happened to be confident enough to demo their systems on it.

ece · on June 25, 2020

Time for more corrections, I don't keep up with Apple stuff. The 855 has ~5-6MB of L1+L2+L3. The A12X/Z has ~18MB of L1+L2+System cache. That's ~2x the performance and ~3x the cache against the 855, and 10% worse performance than the 4800U where AMD has 30% less cache at 12.5MB (L1+L2+L3). The 6 core A13 has 28MB of L2+System cache and is maybe 10% faster on single thread than the 15W 4800U with just 12.5MB! of cache.

Here's a I can haz cache meme for you: https://i.imgflip.com/468v8g.jpg

You want to compare Desktop systems with a mobile chip, but get blown out completely by the multi-thread performance, and then when comparing to a laptop chip when people point out the cache amounts say but look at the single thread performance. Who is the fanboy here? Apple can spend the money on die size/cache if it wants for single thread performance, but the rest of us care about a complete multi core CPU+GPU system. More cache means somewhat lower clocks and power use too, big surprise.

AMD 4800U FP16 4TFlop is 8TFlop for FP8 which is what Apple has, so enough of that. The 8 AVX2 units in the 8 core 4800U will do another ~1TFlop of FP32 if needed in 15W. The A13's AMX seems to have about 1TFlop more of FP8, which is like dual core AVX2 and not 8 cores of AVX2.

Audio/Camera and Video decoders/encoders all do the same stuff anywhere and are basically a commodity for any number of standards, so enough of that too.

Just to be clear, you and other Apple fanboys just can't handle what Apple has currently in CPU is no real way better than a 4800U. Single thread performance (with loads of cache!) is important to JS in the web browser, but by now even most AAA games will do better with more cores, and most real world tasks also do better with more cores. I'm just comparing reality, and you and other fanboys are the ones that aren't.

The 4800U is being generous for multicore CPU+GPU, the A12Z is about equal to the 12 Watt 4 core/4 thread Ryzen 2300U in multi threaded+GPU tasks, it's a 2 year old cheaper processor, and Apple is selling the same performance currently in a $1000+ iPad, I guess this is only possible because of fanboys. Even this is impressive to me given it's an ARM processor+in house GPU and Apple has been making chips for all of a decade now, but I lost all respect for people touting single threaded performance (with loads of cache!) 15 years ago when consumer dual cores first came out. The 2300U will run Shadow of the Tomb Raider at ~30FPS for reference.

Sources: https://www.anandtech.com/show/14892/the-apple-iphone-11-pro... https://www.anandtech.com/show/13392/the-iphone-xs-xs-max-re... https://en.wikichip.org/wiki/qualcomm/snapdragon_800/855

hn_check · on June 25, 2020

Corrections? LOL.

"The A12X/Z has ~18MB of L1+L2+System cache."

The A12X/Z has 256KB of L1 cache per core, 6MB of L2/3 cache shared by all cores.

(256*8)+6144 = 8.2MB of L1+L2 cache.

It has no L3 cache. I don't know where you invented this so-called "system" cache, but are we now ridiculously adding GPU core caches or something absurd? Knowing this argument, probably.

The 855 has 512KB of L1 cache, 1,768KB of L2 cache, 5,120KB of L3 cache.

512+1768+5120 = 7.4MB of cache

You seem to be pulling numbers out of your ass, so refuting the rest of the bullshit you're inventing is a rather futile exercise. But keep on talking about "fanboys". LOL. You came straight form some sad AMD-rationalization website.

dang · on June 25, 2020

If you continue to break the site guidelines we will ban you.

https://news.ycombinator.com/newsguidelines.html