I still wonder how Apple was able to achieve such an incredible performance per watt ratio compared to Intel and AMD. Anybody knows how they let Apple do it?
1. Arm is generally more efficient than x86.
2. Apple uses TSMC's latest nodes before anyone else.
3. Apple doesn't chase peak performance like AMD and Intel. CPU speed and power consumption is not linear. Intel has been chasing 5GHZ+ speeds the last few years which consumes considerably more power. Apple keeps their CPUs under 3.5GHZ.
This is not entirely true in general sense. Yes, a typical ARM CPU is more energy efficient indeed, but theoretically nothing prevents x86 to be nearly as efficient.
The main reason why Apple silicon is more efficient is that Apple silicon is a mobile chip basically, and competition on mobile is harsh, so all the producers had to optimize their chips a lot for energy efficiency.
On the other hand until apple silicon and recent AMD ascension there was a monopoly of Intel on a laptop market with no incentive to do something. Just look at how fast Intel developed asymmetric Arm-like P/N-core architecture right after Apple Silicon emerged. Let's hope this new competitor will force more energy efficient x86 chips to be produced by intel and amd eventually.
> This is not entirely true in general sense. Yes, a typical ARM CPU is more energy efficient indeed, but theoretically nothing prevents x86 to be nearly as efficient.
The very complex instruction set does. You can easily throw multiple decoders at Arm code, but x86 scales badly due to the variable length. Current cores need predecoders to find instruction boundaries which is just not needed with fixed width instructions and even then can only decode simpler instructions with the higher numbered decoders.
> With the op cache disabled via an undocumented MSR, we found that Zen 2’s fetch and decode path consumes around 4-10% more core power, or 0.5-6% more package power than the op cache path. In practice, the decoders will consume an even lower fraction of core or package power.
which is funny because people are always like "uh why do i need to understand asymptotics when machines are so fast". well the answer is the asymptotics catch up to you when the speed of light isn't infinite or when you're timing things down to the nanosecond.
Arm is practically as complex as x86... It supports multiple varieties (e.g. v7, thumb, thumb2, jazelle, v8, etc), lots of historical mistakes, absurdly complex instructions even in the core set (ltm/stm), and a legacy that is almost as long as the x86. It even has variable length instructions too...
Only jazelle and thumb v1 are dropped from most v8 non-ulp cores, and then only half dropped: they still consume decoding resources (e.g. jazelle mode is actually supported and the processor will parse jvm opcodes, just all of them will interrupt). We are stuck with the rest as much as intel is stuck with the 8087: It is about time they could do some culling, but not without backlash.
I'm not sure this holds. X64 decodes instructions (which is awkward) and stores the result in a cache, then interprets the opcodes from that cache. So the decoding cost only happens on a cache miss, and a cache miss on a deeply pipelined CPU is roughly game over for performance anyway.
One big thing is that Apple has (almost) bought out TSMC's N3 node, so they're the only one with chips made on the most advanced manufacturing process available.
It's difficult to compare because honestly most reviewers just suck at making meaningful comparisons.
You can't compare a chip running at 3ghz with one running at 5ghz. It just doesn'tell you anything useful about the architecture, only what the company configuring the chip thought mattered.
Being "only" 30% faster but using twice the power at 5ghz, for example, is entirely expected. Chances are the M1 couldnt even run that fast, or it would end up using just as much power if it did.
Intel would squash an internal project like that, or drown it in politics. You could sit here all day with examples of "why did big company let little company become successful"
Little-ish? PA semi was only 150 people and acquired for < $300 million back in 2008. Intel's market cap was 150 billion back then. Impossible to say how PA semi would have fared, but as a division, it's still way smaller.
Most reviewers base it on Cinebench which is a poor indication of CPU performance for anything except Cinemark. Cinebench uses Intel Embree Engine which is hand optimized for x86. In addition, Cinebench favors CPUs with many slow cores - which is not how most software will perform. This is why AMD heavily marketed Cinebench for Zen1 launch and why Intel heavily markets it now for Alder Lake/Raptor Lake. In fact, Intel's little cores are basically designed to win at Cinebench.
Furthermore, AMD CPUs will rate at 25w but can easily boost up to 40w+ watts. It's up to the laptop maker.
Well, in purely military terms, technically Intel and AMD are only a few miles from Apple and their engineering corps is likely far larger. They could all march over there with broadswords if they really wanted to.
Completely off-topic, but: I think the state of the art in castle design (pre modern explosives anyway) was a star/bastion[1], since that allowed defenders to have overlapping firezones, especially useful once an attacker reaches the walls. With a circular design like Apple's HQ, as attackers get closer to the walls fewer and fewer defensive positions can see them until you can only see them from right above.
Clearly the move is to put all AMD and Intel engineers on the inside of the circle. That way they would be visible from all locations on the ring at all times.
Intel basically hit the clock speed limit and diverged to multiple cores. However, they still make x86 based chips, not ARM. They owned an ARM license for a while and got rid of it. For whatever reason, Intel felt like putting all there money on x86 was their only option. For a while they were making Atom chips for mobile, but at some point that design was hobbled because Intel has always been about the 60%+ margins on server chips. You cannot sell the cheaper chips at the same margins. It's not that Intel couldn't technically figure stuff out, it's that they couldn't see past those 60% margins.
For a while Intel's process knowledge was supposed to be better, even if the design was less efficient, but that turned out to be a mirage around 10nm or so. Intel now without a process advantage is probably never going to regain it's monopoly, and so far hasn't really transformed itself to do anything other than build those high-margin chips.
Once upon a time, I wanted to use one of the chips from a company they bought in networking, but Intel's model is to make the chip and let other companies make a product to take it to market. Intel doesn't want to make a market, just sell into it. You can see that with their attempt at TV where they stopped when they didn't want to spend money on content. So the chip I was interested in didn't get much R&D or a product and it more or less disappeared, another wasted investment.