AMD's Zen 3 CPUs Are Susceptible to Spectre-Like Vulnerability

willvarfar · on April 4, 2021

Kudos to AMD for looking for, finding and then publicizing this themselves.

I wonder if they have red teams trying to break things when they are still in the design stage?

And of course another plus is that we don’t have some vapid detail-light brand-heavy vulnerabilities-need-a-name website popping up too!

parsimo2010 · on April 4, 2021

> another plus is that we don’t have some vapid detail-light brand-heavy vulnerabilities-need-a-name website

AMD disclosing this vulnerability in a way in which they control the message is a big win from their corporate perspective. I'll bet their reputation/stock price takes less of a hit this way than if a security researcher discovered the vulnerability and hyped up the impact. This way AMD gets to downplay this while seeming reasonable and say up front, "most people don't need to be worried we're not aware of anyone using this attack in the wild."

This was also true when Spectre was discovered, but since Intel weren't the ones that discovered the vulnerability, the response from Intel that nobody was using it felt like they were doing damage control rather than a sincere risk assessment.

Everyone needs to mitigate every vulnerability regardless of whether an attack is being used at the moment. It will definitely be used in an attack in the future if nobody fixes it. AMD saying this is just as much damage control as when Intel did it, but it feels different.

xoa · on April 4, 2021

>Everyone needs to mitigate every vulnerability

This is definitely not true, or at least needs a LOT more nuance. All security is an economic equation and involves tradeoffs of costs and benefits, the value of what's being defended vs relevant threat scenarios. A cloud service provider allowing customers to run arbitrary workloads side by side on the same hardware virtualized? Yeah, they need to pay a lot of attention. A stateless high performance cluster running exclusively vetted, authorized and tuned workloads with no direct net access at all (possibly even air gapped)? Only if it doesn't have any sort of negative performance impact. Etc up and down the spectrum.

We're fortunate in software that most vulnerabilities have absolutely zero down side to fixing, they're purely bugs. Or an algorithm no longer considered sufficient that has drop in replacement ones that are superior in every respect. But with some of these hardware issues there can be real tradeoffs in performance/energy usage vs risk.

And you can't just say "It will definitely be used in an attack in the future" for everybody, because the conditions to use it are fairly strict. Not everything is "receive a packet from the internet, get pwned". For some systems, any ability for any arbitrary unauthorized/signed code to run is already game over. For other systems, there simply isn't anything valuable to take, the value is the continuous processing done over long periods and the results will be public. If someone notices it's slowing down due to an attacker trying to run their own code it'll just get nuked and paved with an investigation of how they got in.

Daho0n · on April 4, 2021

You think Intel would publish if they had found spectre themselves?

Edit: Okay, this reads like a snarky question ala "as if Intel would have done that!" but that wasn't what I were aiming for. Sorry about that.

parsimo2010 · on April 4, 2021

I'm not sure if they would or not. I'm sure they employ security researchers. What I don't know is, if Intel was the first to discover a vulnerability of this size and knew that mitigation would impact performance, whether they would have tried to cover it up. I expect that they would have disclosed the vulnerability as soon as they had a mitigation option.

But Intel wasn't the first to discover the Spectre (and Meltdown), and therefore they were stuck with the disadvantage of having to respond to outsiders discovering a vulnerability, rather than being able to get out in front of it, which is what AMD is doing right now. The end result is the same with both brands: you have to disable speculative execution and take a hit to secure your computer, and I don't think either company should be telling customers not to worry. But AMD has the messaging advantage.

woofie11 · on April 4, 2021

I think classic Intel would. Intel's major cash cow and selling point was trust and brand.

Until recently, I bought Intel processors and motherboards through good and bad because Intel had brand credibility. AMD sometimes gave better price-performance, but no one beat Intel for reliability and trust. Intel processors, chipsets, wifi, etc. were stable, had quality drivers, and QC was excellent. I knew the stuff would work.

I don't care about a 30% difference in performance nearly as much as that.

Historically, Intel was very good about publishing processor manuals, and bug lists. They weren't always perfect about fixing them, mind you, but they weren't bad either. Recall the FDIV bug. They took a lot of flack for that, but ultimately, they agreed to replace all affected processors. Or read e.g. an Intel 80486 manual and errata.

Intel seems to be imploding right now, and I don't know that they would publish something like this today, but that's a pretty recent phenomenon -- past five years or so. I just bought my first AMD desktop (with ECC!), but I'm still kind of hoping Intel will go back to the old ways -- transparent, reliable, documented, stable, and trustworthy. I'd rather get that, even if I need to pay double or lose a little bit of performance.

Those allowed Intel to sell high-margin products.

I'm not sure if the market would sustain that today. Perhaps the market for high-margin products has imploded. Right now, cloud infrastructure seems to rely on massive numbers of less reliable systems, rather than super-reliable servers as we had even a decade ago. Scientific computing has moved in a similar direction. No one makes serious business laptops/desktops anymore like classic IBM ThinkPads (which ran around $10k in today's dollars) or Sun/SGI/HP/etc. workstations for that matter. Or it might be that Intel drove the market in that direction. I can't quite tell.

Or it might be that AMD is taking Intel's former niche. Again, with AMD, my desktop finally has ECC again. That's something I haven't had for many years.

What I'd really like is a reliable laptop which could self-monitor. Memory has ECC. SSD has RAID. Every wire is monitored for errors, and if there's an issue somewhere, it can be pinpointed to the user. USB devices can't crash my computer. If any IC is unreliable, I'm immediately let know about what failed. If a hard drive has a bad cable, I'm told which cable is bad. If a fan is failing, I know. All voltages are monitored. If a heatsink falls off, the system shuts itself down with no damage and logs the event. Computations is error-correcting. Etc. The hardware Just Works, and if something breaks, I can fix it proactively rather than reactively when I lose my data. Think of the systems NASA used in the space shuttle. It wouldn't add much to the cost; most of this is just NREs. The amount of actual silicon needed would be pretty small.

xiphias2 · on April 4, 2021

You may not care about the +30% performance for a desktop machine, but for a laptop it means +30% time spending in a café without needing to bring your charger with you, and in a datacenter it means -30% electricity bills. I haven't seen the numbers, but I think desktops are getting less important market for CPU makers.

woofie11 · on April 4, 2021

30% hit to performance isn't the same as 30% hit to power.

Right now, I'm typing text into an HTML textarea element. Most of my laptop's power is going to the LCD backlight, I would guess. Dunno, though. Maybe some tab in the background is sucking up 100% CPU not doing anything, in which case, power consumption is the same. In one scenario, that tab is sucking up less compute.

If you're sitting in a cafe, using your laptop to train machine learning algorithms, you might lose 30% battery time, but you're probably doing something wrong.

And I would gladly pay 30% more on my AWS bills for more reliable machines. It depends on what you're doing, but for what I'm doing, server costs are a rounding error in our budget. Most money goes to people.

But I don't think I'd pay 30% more in AWS bills since a lot of what they do is IO-bound and not CPU-bound.

I think the tougher calculation would be on GPUs. At least for what I do, on the rest of the computer, it's a no-brainer.

xiphias2 · on April 4, 2021

I made a mistake of buying an expensive gamer laptop (Razer) with a strong Intel CPU and NVIDIA 2070 RTX GPU, and I just didn't like it. Sure, I could do machine learning on the laptop, but I gave up so much flexibility in my life, as the battery usually died in a few hours even when I was just web browsing.

Right now I just bought an M1 MacbookPro machine, and I love it (especially the great screen that I can use in the bright sun, and the instant on that finally feels like a phone/tablet), although I prefer the 14/15 inch form factor (maybe an AMD Zen 2 laptop would have been a great choice as well, I'm not sure).

derefr · on April 4, 2021

You’d better see those numbers, then: desktops saw a major rebound since COVID. Employers are buying their remote employees home-office workstations, rather than laptops. A work laptop (as we’ve mostly stabilized on for the last decade) doesn’t make as much sense when the employer knows they won’t be asking the employee to commute, let alone travel for work. May as well get more bang for your buck with a fixed installation.

tedunangst · on April 4, 2021

Sadly, my laptop doesn't devote 100% of its power budget to CPU.

zsmi · on April 4, 2021

Intel disclosed FDIV after being contacted by a 3rd party and they allegedly kept it hidden for at least a couple of months.

https://en.wikipedia.org/wiki/Pentium_FDIV_bug

"Nicely noticed some inconsistencies in the calculations on June 13, 1994, shortly after adding a Pentium system to his group of computers, but was unable to eliminate other factors (such as programming errors, motherboard chipsets, etc.) until October 19, 1994. On October 24, 1994, he reported the issue to Intel. According to Nicely, his contact person at Intel later admitted that Intel had been aware of the problem since May 1994, when the flaw was discovered by Tom Kraljevic, a Purdue University co-op student working for Intel in Hillsboro, Oregon, during testing of the FPU for its new P6 core, first used in the Pentium Pro."

totalZero · on April 4, 2021

The context is different here though. The vulnerability as a general problem has already been disclosed and studied. Its impact to customers today is not the same as it was in 2018.

tim-- · on April 4, 2021

I thought the whole AMD team was the red team! /s

makomk · on April 4, 2021

The underlying AMD paper made the front page of HN a few days ago, though without much discussion: https://news.ycombinator.com/item?id=26645903

It's an interesting vulnerability but of limited impact, since of course every fast modern CPU is affected by Spectre anyway. The main consequence seems to be that, at least in theory, this has to be disabled in order for certain Spectre mitigations to work correctly. I don't think anyone's found a practical attack using this yet.

hpcjoe · on April 4, 2021

The operative phrase is "yet". Better safe than sorry. I've not read the paper yet, and I am curious on the performance impact. I don't mind trading some performance for better security. Because when exploits are found in the wild, its too late.

mlyle · on April 4, 2021

The thing is, this can be disabled A) per thread, and B) can't leak from other processes (it's flushed on each context switch).

So really, we want certain processes with software based sandboxing -- e.g. Javascript interpreters-- to turn this off.

alanfranz · on April 4, 2021

Wasn’t a spectre paper published some days ago, and extensively discussed on hn?

All cpus can be affected by spectre. Including arm, mips, etc. Unless we totally disable speculative execution/branch prediction, which will impact performance. A lot.

EDIT: it wasn't a paper per-se, but the Google PoC: https://security.googleblog.com/2021/03/a-spectre-proof-of-c...

mhh__ · on April 4, 2021

Spectre-like is the key thing here. They've introduced a new mechanism accessible presumably by a similar side channel (haven't read the paper yet)

alanfranz · on April 4, 2021

Spectre is a class of bugs right now. The "classical" spectre relies on branch prediction, here we have another kind of prediction... but it's not that different, IMHO.

Causality1 · on April 4, 2021

Consumers who work with software that employs sandboxing and are alarmed about PSF have the choice to disable the PSF functionality

Thank goodness. That's a much better way of handling it than the "sky is falling" response to Spectre and Meltdown that saw some people receive 20% performance drops via silent Windows Updates, despite neither of them being actively exploited in the wild.

mhh__ · on April 4, 2021

These vulnerabilities will probably be around as long as we have this breed of processors, at the very least.

I can't see an obvious way to hide the processors internal state at this level while also maintaining a weak enough structure to allow performance. Maybe there's ideas cooking inside the chip companies but so far I haven't seen anything public

wffurr · on April 4, 2021

ARMv9 hardware memory segmentation seems promising: https://www.anandtech.com/show/16584/arm-announces-armv9-arc...

ethbr0 · on April 4, 2021

Doesn't that preclude shared memory?

emayljames · on April 4, 2021

This processor, that constantly shifts things around, is currently unhackable: https://m.hexus.net/tech/news/cpu/147532-unhackable-morpheus...

BlueTemplar · on April 4, 2021

Any ideas about Ryzen CPU susceptibility to built-in backdoors ?

https://blog.invisiblethings.org/papers/2015/x86_harmful.pdf

throw0101a · on April 4, 2021

Everything is susceptible to backdoors depending on your level of paranoia. You can have an open source architecture, like RISC-V, and yet the people at the fab can insert backdoors into the silicon. Building trusted hardware is an area of active research:

* https://spectrum.ieee.org/semiconductors/design/stopping-har...

Then you have to worry about trusting software. Even if you compile it from scratch, the compiler itself can be a threat (as outlined by Thompson1984):

* https://www.schneier.com/blog/archives/2006/01/countering_tr...

* PDF: https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_Ref...

BlueTemplar · on April 4, 2021

Of course. To be more specific, I'm concerned about widespread compromise by NSA of computers in European companies with the goal of industrial espionage. (Like the way that Intel's Management Engine can potentially be this kind of threat.)

opencl · on April 4, 2021

If you're concerned about Intel's ME, AMD's PSP is basically the same thing.

Fire-Dragon-DoL · on April 4, 2021

Are we going to see cpu performance decrease in amd too?

ByTheBook · on April 4, 2021

Phoronix did a benchmark on Linux, and states that the performance impact was less than a half percent: https://www.phoronix.com/scan.php?page=article&item=amd-zen3...

Fire-Dragon-DoL · on April 4, 2021

Oh that's cool, I'm not looking forward another downgrade like the Spectre one

JediPig · on April 4, 2021

I seen the benchmarks with it on and off.... and..

sometimes you cant tell its off, all with the margin of error. More than likely, they are just going to perm turn it off. There will be no difference in performance.