Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I still wonder how Apple was able to achieve such an incredible performance per watt ratio compared to Intel and AMD. Anybody knows how they let Apple do it?



A few reasons.

1. Arm is generally more efficient than x86. 2. Apple uses TSMC's latest nodes before anyone else. 3. Apple doesn't chase peak performance like AMD and Intel. CPU speed and power consumption is not linear. Intel has been chasing 5GHZ+ speeds the last few years which consumes considerably more power. Apple keeps their CPUs under 3.5GHZ.


> Arm is generally more efficient than x86

This is not entirely true in general sense. Yes, a typical ARM CPU is more energy efficient indeed, but theoretically nothing prevents x86 to be nearly as efficient.

The main reason why Apple silicon is more efficient is that Apple silicon is a mobile chip basically, and competition on mobile is harsh, so all the producers had to optimize their chips a lot for energy efficiency.

On the other hand until apple silicon and recent AMD ascension there was a monopoly of Intel on a laptop market with no incentive to do something. Just look at how fast Intel developed asymmetric Arm-like P/N-core architecture right after Apple Silicon emerged. Let's hope this new competitor will force more energy efficient x86 chips to be produced by intel and amd eventually.


> This is not entirely true in general sense. Yes, a typical ARM CPU is more energy efficient indeed, but theoretically nothing prevents x86 to be nearly as efficient.

The very complex instruction set does. You can easily throw multiple decoders at Arm code, but x86 scales badly due to the variable length. Current cores need predecoders to find instruction boundaries which is just not needed with fixed width instructions and even then can only decode simpler instructions with the higher numbered decoders.


> Current cores need predecoders to find instruction boundaries which is just not needed with fixed width instructions

The question is, how much overhead does it cause compared to the whole picture. There are empirical evidences the answer is "very little":

https://chipsandcheese.com/2021/07/13/arm-or-x86-isa-doesnt-...

> With the op cache disabled via an undocumented MSR, we found that Zen 2’s fetch and decode path consumes around 4-10% more core power, or 0.5-6% more package power than the op cache path. In practice, the decoders will consume an even lower fraction of core or package power.


> The very complex instruction set does.

i.e., PSPACE ⊆ EXPTIME

https://en.wikipedia.org/wiki/EXPTIME

which is funny because people are always like "uh why do i need to understand asymptotics when machines are so fast". well the answer is the asymptotics catch up to you when the speed of light isn't infinite or when you're timing things down to the nanosecond.


Arm is practically as complex as x86... It supports multiple varieties (e.g. v7, thumb, thumb2, jazelle, v8, etc), lots of historical mistakes, absurdly complex instructions even in the core set (ltm/stm), and a legacy that is almost as long as the x86. It even has variable length instructions too...


Many of which were dropped for 64bits ARM.


Only jazelle and thumb v1 are dropped from most v8 non-ulp cores, and then only half dropped: they still consume decoding resources (e.g. jazelle mode is actually supported and the processor will parse jvm opcodes, just all of them will interrupt). We are stuck with the rest as much as intel is stuck with the 8087: It is about time they could do some culling, but not without backlash.


I stand corrected, thanks.


I'm not sure this holds. X64 decodes instructions (which is awkward) and stores the result in a cache, then interprets the opcodes from that cache. So the decoding cost only happens on a cache miss, and a cache miss on a deeply pipelined CPU is roughly game over for performance anyway.


> Apple doesn't chase peak performance like AMD and Intel.

Intel and AMD also make low-power parts.


But they don't make high-end and performant, low-power parts (yet).


One big thing is that Apple has (almost) bought out TSMC's N3 node, so they're the only one with chips made on the most advanced manufacturing process available.


It's difficult to compare because honestly most reviewers just suck at making meaningful comparisons.

You can't compare a chip running at 3ghz with one running at 5ghz. It just doesn'tell you anything useful about the architecture, only what the company configuring the chip thought mattered.

Being "only" 30% faster but using twice the power at 5ghz, for example, is entirely expected. Chances are the M1 couldnt even run that fast, or it would end up using just as much power if it did.


Intel would squash an internal project like that, or drown it in politics. You could sit here all day with examples of "why did big company let little company become successful"


Apple's market cap is currently 20x Intel's market cap. Is Apple supposed to be the "little" company?


Little-ish? PA semi was only 150 people and acquired for < $300 million back in 2008. Intel's market cap was 150 billion back then. Impossible to say how PA semi would have fared, but as a division, it's still way smaller.


But PA Semi wasn't close to Intel in 2008 when it had 150 people.


The latest mobile amd zen 4 has a comparable efficiency* to apple m2 despite not being arm or having hybrid architecture. See 7840U.

* Within up to 15% at 25W.


This is not true.

Maybe in light threads that utilize many cores.

Most reviewers base it on Cinebench which is a poor indication of CPU performance for anything except Cinemark. Cinebench uses Intel Embree Engine which is hand optimized for x86. In addition, Cinebench favors CPUs with many slow cores - which is not how most software will perform. This is why AMD heavily marketed Cinebench for Zen1 launch and why Intel heavily markets it now for Alder Lake/Raptor Lake. In fact, Intel's little cores are basically designed to win at Cinebench.

Furthermore, AMD CPUs will rate at 25w but can easily boost up to 40w+ watts. It's up to the laptop maker.


You can easily limit the power to 25W. Most manufacturers typically have a silent mode which does exactly that.

Not sure what you mean by many slow cores, since mobile zen 4 has a better single-core performance than m2 pro.


Zen4 desktop has - at the expense of much higher power consumption.

Zen4 mobile does not have higher ST performance than M2 series.

https://browser.geekbench.com/processors/amd-ryzen-7-pro-784...

https://browser.geekbench.com/macs/mac-mini-2023-12c-cpu

7840u's ST is slower by 21% while consuming much more power during the test.


I don’t know where to begin… There is a lot of material on the internet that is relevant to answering that.

What do you mean “how they let Apple do it”. Do you think Intel & AMD could stop them?


Well, in purely military terms, technically Intel and AMD are only a few miles from Apple and their engineering corps is likely far larger. They could all march over there with broadswords if they really wanted to.


The circular design of the HQ makes sense now.

https://www.reddit.com/r/castles/comments/4t5w0q/round_vs_sq...


Completely off-topic, but: I think the state of the art in castle design (pre modern explosives anyway) was a star/bastion[1], since that allowed defenders to have overlapping firezones, especially useful once an attacker reaches the walls. With a circular design like Apple's HQ, as attackers get closer to the walls fewer and fewer defensive positions can see them until you can only see them from right above.

1: https://en.wikipedia.org/wiki/Bastion_fort


In all likelihood Intel would attack from the middle of the circle...


Clearly the move is to put all AMD and Intel engineers on the inside of the circle. That way they would be visible from all locations on the ring at all times.


A 'reverse Trojan horse'? The defenders sneak the attackers in rather than the attackers trying to sneak in?


That sounds right.


I mean, how didn't Intel and AMD saw what apple was creating.

PCs have been stuck to 3/4Ghz for more than 15 years, so it is not like they didn't have the time to optimize from the consumption/heat point of view.


It's kind of the opposite: Intel and AMD are burning power racing to 6 GHz while Apple targeted a more efficient 3-4 GHz.


Intel basically hit the clock speed limit and diverged to multiple cores. However, they still make x86 based chips, not ARM. They owned an ARM license for a while and got rid of it. For whatever reason, Intel felt like putting all there money on x86 was their only option. For a while they were making Atom chips for mobile, but at some point that design was hobbled because Intel has always been about the 60%+ margins on server chips. You cannot sell the cheaper chips at the same margins. It's not that Intel couldn't technically figure stuff out, it's that they couldn't see past those 60% margins.

For a while Intel's process knowledge was supposed to be better, even if the design was less efficient, but that turned out to be a mirage around 10nm or so. Intel now without a process advantage is probably never going to regain it's monopoly, and so far hasn't really transformed itself to do anything other than build those high-margin chips.

Once upon a time, I wanted to use one of the chips from a company they bought in networking, but Intel's model is to make the chip and let other companies make a product to take it to market. Intel doesn't want to make a market, just sell into it. You can see that with their attempt at TV where they stopped when they didn't want to spend money on content. So the chip I was interested in didn't get much R&D or a product and it more or less disappeared, another wasted investment.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: