I'm going to be honest, I need an ELI5 here. I know what the meltdown/spectre is...

nolok · on Jan 22, 2018

Big simplification:

Proper way to fix an hardware bug like this, is that newer cpu gets protected by default, and they answer they are when queried.

So you can ask the CPU "what's your status on bug X" and the cpu answers "i'm good, you don't need to do anything" (newer fixed chips), or "i know about it but was already built, and need microcode update/special behavior to protect myself" (current chips with microcode update), "no answer / I'm not good" (old chips without update).

So new stuff is protected, and you add more protection (and slowdowns, and special stuff) for older chips that don't know how to deal with it.

What Intel is trying to do here, is to go the other way: the chips, even the new ones, will stay vulnerable by default, and when queried they say "I have a fix but I don't use it, you can enable it by asking !" and the kernel is supposed to enable it.

It's terrible for a lot of reasons, like "boot an older os and it's vulnerable since it doesn't know to call this", "additional code to enable this feature has to run for all of eternity for new chips now, instead of having to run for older chips and being phased out over time", etc ...

The reason why Intel does that seems obvious: by default the chip does not lose speed since the fix is not enabled, and so instead of "intel chips lose 30% speed over night because of a flaw" it becomes "intel adds a special security mode that protects you even more for critical applications, at the cost of some speed". Purely marketing speech and decision at the cost of proper engineering decisions, and they need and try to get OSes like Linux to play along. That's what he means by "[it] shows intel had no intention of fixing those flaws".

Additionally there seems to be a second issue in that the quality and behavior of the patches they submitted are trying to hide this deceptively simple but technically terrible behavior by making it look/sound obtuse and complicated.

In other words, intel is using its presence and weight to try and push a shitty solution, but one that is better for them marketing wise. Linus is flabbergasted to be treated like an idiot or a obedient drone that should apply such obvious abusive patches.

sundvor · on Jan 22, 2018

Awesome explanation. That is indeed deserving of the word "f*cked".

My next CPU will be an AMD then.

beedogs · on Jan 23, 2018

That's the upshot of this debacle for me, too. Intel has lost me (again) as a customer.

grzm · on Jan 23, 2018

How did they (temporarily) win you back?

beedogs · on Jan 23, 2018

For a while, they were really the only game in town when AMD was struggling to keep up, performance-wise. Now with Ryzen there's really no reason for me not to switch back. I was an AMD fanboy from the K6-II through a few Athlon revisions.

cryptonector · on Jan 22, 2018

Linus also says that this shows Intel means never to fix Spectre2. Of course, that would only be their current position -- they could change their minds later. That strongly implies that the decision to disable by default is a marketing decision, but take this with salt -- it could also be a bad engineering decision.

master_ant · on Jan 22, 2018

> In other words, intel is using its presence and weight to try and push a shitty solution, but one that is better for them marketing wise. Linus is flabbergasted to be treated like an idiot or a obedient drone that should apply such obvious abusive patches.

I see where the brashness comes from. Shady dealings on Intel's part.

_pfxa · on Jan 22, 2018

Wow. That's outright malicious behaviour from Intel there.

kbenson · on Jan 22, 2018

It's not entirely clear to me, but in some of the followup emails it appears that Linus was mistaking the purpose of some patches (or flags at least) due to unobvious naming, but I'm unsure if that significantly alters his criticism. He says it still applies, but is much more muted in tone about it (and whether that's from the original email possibly not being intended to by public or not, I don't know).

lucb1e · on Jan 22, 2018

This doesn't make sense though. If Intel indeed plans to leave chips vulnerable except if you set a flag, then how are these bullshit patches? This will be the only solution Intel is going to deliver right? I get that he doesn't like it, but that doesn't explain why he feels lied to. If Intel says "we're going to not turn the patch on by default", wherein does he suspect the lie?

nolok · on Jan 22, 2018

He's complaining about their "fix" being terrible, but isn't fully against using it the end since as you said, that's all there is going to be to have the chips work properly.

The reason he refuses those current patches and directly call it a lie/deception is because of what my last two paragraphs related; if you read his message (where the link points to) it's about half way: Intel tries to disguise it by doing it in a convoluted way. Basically they try to avoid making it obvious when looking at the code, because they don't want a "if (intel_chip) enable_fix_because_default_is_broken_on_intel();" and instead pushes something that looks like the kernel needs to do lots of complex stuff [aka, "it's complex, and a fix-on-chip is not enough the kernel needs protection anyway !", and that means a terrible patch with lots of garbage and filler code.

Intel's intention is clear in that they specifically pushes this in the same patchset as the "tell the chip to be secure", trying to mush the two things together to make it looks like it's all the same thing, whereas in reality it should be two patchset: one to enable the security mode, and bad for intel marketing wise. And a second one to add those "fixes" to the kernel, that would be refused because terrible and in part unecessary since retpoline already protects it. What Linus is saying is "sure I need the first change, but since you're intent on pushing them together I'm refusing them, because the second one is pure garbage, and you mix them together to hide the first".

Eg quotes from said mail to show it's indeed his problem:

> So instead they try to push the garbage down to us. And they are doing it entirely wrong, even from a technical standpoint.

and

> The patches do things like add the garbage MSR writes to the kernel entry/exit points. That's insane. That says "we're trying to protect the kernel". We already have retpoline there, with less overhead.

(what he means here is that they try in their patch to make it look like the kernel needs a special protection, while it already has it through retpoline)

and

> So somebody isn't telling the truth here. Somebody is pushing complete garbage for unclear reasons. Sorry for having to point that out. If this was about flushing the BTB at actual context switches between different users, I'd believe you. But that's not at all what the patches do.

(eg "why are you pushing all this crap around to hide what's really happening/need to be executed")

lucb1e · on Jan 23, 2018

That makes sense, thank you!

IntelMiner · on Jan 22, 2018

The reasonable expectation would be that Intel fixes one (or both) of the bugs

- In newer CPU's. There should be mitigations against these attacks. That would probably seriously hurt intel by delaying their future processor launches

- In Older/existing CPU's through microcode updates. Bar literally making "fixed" versions of every Intel CPU in the last 10~ years. This is the only way to resolve the issue on existing hardware

Instead of doing that. Intel wants to avoid the much-reported "30% performance hit" by simply saying "Well if you want this FEATURE, you can enable it in your OS!"

Intel is trying to downplay a massive security vulnerability in their hardware as something that OS vendors can just let users opt in/out of

frevd · on Jan 22, 2018

It's not Intel's issue, it's a design flaw per se, affecting _all_ CPUs that use predictive branch execution that has effects on the processor cache, which are pretty much all processors produced in this millenium.

That said, there _might_ be a solution to this problem in a way that predictive branch execution does not need to be removed completely from future architecture, which would be a thing we don't really want to loose, even if it increases safety. During that time, it makes sense to disable it, but not by default. The only implication is that older systems must be patched, which is every admin's responsibility.

wtetzner · on Jan 22, 2018

> It's not Intel's issue, it's a design flaw per se, affecting _all_ CPUs that use predictive branch execution that has effects on the processor cache, which are pretty much all processors produced in this millenium.

Just because other CPUs have this flaw, doesn't mean this isn't Intel's issue. Regardless of the state of other CPU manufacturers, Intel is producing buggy CPUs.

iapx88 · on Jan 22, 2018

> which are pretty much all processors produced in this millenium.

Is there a simple table of every mainstream purchasable CPU out-there and whether it was affected?

Latty · on Jan 22, 2018

To be clear: I'm not knowledgeable about this at all, so I could be way off base, but my reading is that he's saying that the patches seem to be doing things that don't make sense (given the information supplied with them) - that is, Intel are trying to sneak in extra fixes or other things alongside without talking about them.

Tijdreiziger · on Jan 22, 2018

The bullshit part is that Intel is trying to push this as a 'solution'. Linux is incredibly important, so if Linus does not approve of this 'solution', it'll be very difficult for Intel to go through with it (of course, they could also be brash and still do it).

nolok · on Jan 22, 2018

See my second message (next to yours), it's not just that, yes he thinks and clearly says that thissolution is terrible but that's not why he calls them out to be basically liars; this one is because they put useless filler garbage code all around to hide what's happening in their patches.

I think we're lucky to have someone as clear, outspoken and refusing such crap in charge of the kernel.

cthalupa · on Jan 22, 2018

Part of the problem is Linus doesn't actually understand the different portions of what Intel is doing, and is mixing up IBPB and IBRS. They do different things, and he's thinking they're all part of the same thing.

This could be a sign that these things are poorly written and need to be refactored into something more obvious, or it could be that they're so fundamentally complex that it's going to be difficult to grasp without context.

wtetzner · on Jan 22, 2018

> or it could be that they're so fundamentally complex that it's going to be difficult to grasp without context.

If it is that fundamentally complex, then it sounds like they need to find a better solution.

cthalupa · on Jan 23, 2018

I don't disagree, but they are working under a time crunch trying to fix something that is a flaw fundamental to modern chip design.

Hopefully the goal here is to get everything to a secure state, with time to iterate and improve once everyone can sit back and breathe. Hopefully.

Eupolemos · on Jan 22, 2018

Thank you.

1911z · on Jan 22, 2018

Thank you for your explanation.

microtherion · on Jan 22, 2018

Thanks for your informative explanation. However…

> "boot an older os and it's vulnerable since it doesn't know to call this"

Presumably the hardware that fixes this is not even available on the market yet. How likely is it that somebody will go out of their way to install an obsolete OS version on their brand new hardware?

jessaustin · on Jan 23, 2018

An obsolete version that runs faster and benchmarks better, which some customers won't realize is less secure? It doesn't seem unlikely...

alerighi · on Jan 22, 2018

The problem is that every fix that you could think for Spectre reduces the performance of the CPU.

So not enabling this by default it's a good choice, Spectre is very difficult to exploit: so if you do critical things you enable the fix, if you use the computer for gaming, video rendering, and things where you don't care too much about security but you care about performance, you don't enable it.

eeeficus · on Jan 22, 2018

Why not the other way around? You have the fix enabled and if you don't know you get protected by default. If you really know better then you can disable the fix (via a special CPU instruction), because you know you're not running anything critical?

draugadrotten · on Jan 22, 2018

> you can disable the fix (via a special CPU instruction)

The CPU can not be allowed to disable the fix, because then that could be done by an attacker. Therefore the only more secure way is to move in the secure direction, from insecure to more secure.

simias · on Jan 22, 2018

Nonsense, just make it so that only privileged kernel code can modify this configuration. Tons of CPU configuration parameters already work that way, it's a non-issue.

If for some reason you even want to forbid even privileged code from modifying the config then add an other "lock" bit that forbids subsequent reconfiguration till the next reboot.

nolok · on Jan 22, 2018

Uh no, they would obviously make it so only kernel code can run that, like many other such settings.

And if an attacker can run code at the kernel level it's a non issue, as they're already on the other side of the airtight hatchway anyway [1]: they're in control of the computer and the memory.

[1]: https://blogs.msdn.microsoft.com/oldnewthing/20060508-22/?p=...

IntelMiner · on Jan 22, 2018

"Spectre is very difficult to exploit"

From what I've seen. There's been demonstrated attacks using Javascript in Chrome to dump the saved passwords from the browser using these bugs

If an attack is that easy to pull off, I don't think it's reasonable to make it an "opt in"

syncsynchalt · on Jan 22, 2018

Agreed, I keep hearing it's difficult yet user om2 on the webkit team says they were able to come up with multiple attacks internally in the webkit team once they'd heard about the trick [1].

Safari/webkit have since rolled out mitigations to prevent the attacks that they figured out but it puts the lie to the idea that Spectre is only a theoretical attack that we've yet to see an exploit for.

[1] https://news.ycombinator.com/item?id=16104831

JohnStrange · on Jan 22, 2018

Insecure defaults are always bad. The other way round would be the right choice. Let users downgrade their security for performance, if they insist.

Johnny555 · on Jan 22, 2018

Here's an explanation of retpoline:

https://support.google.com/faqs/answer/7625886

“Retpoline” sequences are a software construct which allow indirect branches to be isolated from speculative execution. This may be applied to protect sensitive binaries (such as operating system or hypervisor implementations) from branch target injection attacks against their indirect branches.

The name “retpoline” is a portmanteau of “return” and “trampoline.” It is a trampoline construct constructed using return operations which also figuratively ensures that any associated speculative execution will “bounce” endlessly.

dingo_bat · on Jan 22, 2018

Maybe I wasn't clear. Your explanation and the linked article is very informative, but I wanted to understand what's the "garbage" Linus is talking about. As I said, I do understand retpolines from a high level.

jpgvm · on Jan 22, 2018

The garbage part is still somewhat beyond my understanding but as I see it he isn't so much talking about the decision to not disable insecure branch prediction by default but rather addressing some very weird behaviour the patches add to kernel entry/exit points. Namely writing to MSRs (Model Specific Registers). This seems non-sensical as the branch predictor shouldn't need screwing with at this stage because the kernel already has retpoline protection. So he is musing there is further ulterior motives here.. perhaps another vulnerability (beyond Meltdown/Spectre) they are getting out ahead of with these very peculiar changes.

He is still ofcourse mad that they don't seem like they want to fix Spectre correctly but that seems tangential to how pissed he is that they are trying to get code merged that clearly does something other than just mitigate Spectre.

Unfortunately this entire thread is derailed with garbage about how Linus talks, rather than the fact he thinks Intel is doing something really fucking dodgy here and we should all try work out what it is.

aidenn0 · on Jan 22, 2018

Linus seems to have two complaints:

1) Recent patch submissions imply that Intel has no good hardware or microcode mitigation for spectre-like attacks. There is a sub-complaint that Intel has a bad (i.e. kills performance) fix, but will not enable it by default because benchmarks matter.

2) This series of patches in particular appears to be doing either something different, or more than what their descriptions imply

These patches do various things, presumably to manipulate the opaque internal state of the CPU, but only Intel knows for sure precisely what they do.

ForHackernews · on Jan 22, 2018

> Intel has no good hardware or microcode mitigation for spectre-like attacks

I was under the impression that for at least one category of attack, there is no hardware mitigation possible because it's a fundamental problem with the x86-64 design. Fixing it would require building a chip that uses some other architecture. Is that not the case?

chowells · on Jan 22, 2018

That's not exactly true. You could build an x86-64 chip without these flaws, but it would require a new internal architecture with a lot more silicon.

One obvious approach would be to have two caches per core. Speculative execution would use a different cache than normal execution. If the speculative action is committed to, it swaps which cache is the normal one and which is the speculative one. Then you'd also need to flush the branch predictor on context changes. And a few other issues.

Nothing that's impossible to do, but it would require a huge amount of new design and a lot more silicon just to maintain the performance of current chips without mitigation.

wilun · on Jan 22, 2018

It will very probably not be double caches and nothing that kind of order for requested new silicon area. Its only an annoyance because the design are not gonna change for solid Spectre resistance (at least for its currently known versions v1 and 2) for the next chips, because their design are already complete since month if not years, and yes, that would be very significant changes. But separating the caches? Never gonna happen. Anything taking the same space as separating the caches? Never gonna happen, and actually not needed.

Yes a solid Spectre fix will obviously make designers rethink their microarch in some deep aspects. But somewhat good mitigations should be available as soon as the next chips, and off-by-default is completely utterly insane. The OS is not part of the platform (except in some special cases that can disable the mitigation for performance if they like), and the platform is supposed to be retro-compatible, maybe not perfectly but reasonably. Ok it has already been somewhat less true than before in the few recent year, but lets not encourage that behavior. So shipping new CPU that are broken by default but can be somewhat less broken as an opt-in is an attempt to mask the level of the fuck-up, or maybe to avoid the creation of a new stepping. We should not tolerate that from Intel, a stepping is expensive but they have the money to do it.

chowells · on Jan 22, 2018

My point was solely that there are ways to preserve x86-64 as an ISA, and listed the single most obvious way to go about it. I never suggested that it was the best solution, or that those changes would be made any time soon, or that Intel's behavior has been anything but atrocious.

Did you mean to respond to someone else, maybe?

singingboyo · on Jan 24, 2018

I'm no chip designer, but maybe a 'small' speculation cache which allows quick moving to the real caches might be better just in terms of less cache needs. If there's not enough space then you can't speculate farther, and that's that.

Of course, that'd probably need more complex logic to manage this new cache, which makes things more difficult. Then again, not sure how two caches interacts with potential speculation across multiple branches (does Intel even do that?)

lucb1e · on Jan 22, 2018

(FYI you're missing your [1] reference.)

aidenn0 · on Jan 22, 2018

Thanks, I had a footnote, but decided it would be more confusing than clarifying at the ELI5 level.

ndh2 · on Jan 23, 2018

There's a follow up email that has much more details. Not ELI5 though.

> But since the peanut gallery is paying lots of attention it's probably worth explaining it a little more for their benefit.

http://lkml.iu.edu/hypermail/linux/kernel/1801.2/05282.html