Hacker News new | past | comments | ask | show | jobs | submit login
VME Broken on AMD Ryzen (os2museum.com)
111 points by monocasa on May 12, 2017 | hide | past | favorite | 68 comments



This may be an unpopular opinion, but making ancient x86 features unreliable strikes me as a good thing. Virtual 8086 mode is an awful virtualization solution, as you can't enter protected mode from within it. This makes it only good for running a subset of DOS apps: either real-mode-only apps or those written to an unspecified subset of DPMI [1]. Ryzen CPUs are easily fast enough to emulate them. And if that's not enough, x86 has had real virtualization for years now, with a functioning protected mode (and long mode!) in addition to real mode.

We're long overdue for a simplification of the PC platform. Having CPUs out there with broken legacy features accelerates the migration away from those features. Hopefully this leads to dropping them outright at some point.

[1]: https://en.wikipedia.org/wiki/DOS_Protected_Mode_Interface#H...


VME is an optional feature indicated by a CPUID flag. AMD could have simple not supported the feature flag.


Perhaps they'll submit a microcode update which does that. (assuming they can do that with microcode, but can't fix the problem with it)


I think this is just a bug though. But yes, I have been thinking of creating a new version of x86 that drops things like segmentation, real mode, virtual 8086 mode, and 16-bit protected mode.


Coincidentally I've also thought about creating a "better x86", but in the opposite direction: on the basis that segmentation is extremely powerful and useful, I think it would've been better for all the various virtualisation extensions to be based on the same model that V86 and protected mode started: segment descriptors of different types. That would allow V86, 16, 32, and 64-bit (using "double-wide" descriptors, to allow a 64-bit base and limit to fit) protected mode "tasks" or VMs to coexist completely, and the OS to switch between them easily. There are enough reserved spaces in the existing specification to allow these to fit nicely (in fact, what I came up with fits so neatly that I wonder if Intel had originally envisioned them for this purpose, if AMD hadn't beaten them with AMD64.)

Instead, now we have partially-backwards-compatible (and even slightly different) AMD64/Intel64 and at least two non-interoperable virtualisation extensions (AMD-V, VT-x).


If you're going to create a wildly incompatible "better x86", just start with a RISC design and be done with it. Compatibility with modern x86 OS's is the reason for x86's existence (note that v8086 mode is not required for this). Drop that and you might as well just fix all the other problems with the ISA while you're at it.


I'd be interested to know how much ARM have an advantage here in terms of their ISA? Presumably if more resources were dedicated to it, ARM cpus could be quite a lot faster than x86?


I doubt it. For one, 32-bit ARM is pretty complex too (though 64-bit ARM is a very nice simplification). More importantly, though, you can overcome the x86 complexity by simply throwing enormous manpower at the problem, and that is what Intel has done over the years.

(That doesn't mean the complexity is justified, just that it can be overcome. I think that people often assume that x86's dominance means that there's something inherently amazing about x86 from a microarchitectural point of view. That isn't true; rather it's simply that x86/Windows has historically earned Intel so much revenue that they've been able to fund the manpower needed to keep it alive and on top.)


It's complicated. There's a fixed amount of design overhead in supporting all the various x86 modes but compared to designing a top-end processor those aren't too big. And being able to decode 4 x86 instruction at once is hard compared to doing that with ARM but compared to the size of a 192 entry reorder buffer for deep out of order execution the cost isn't huge. Both x86 and ARM want to convert their ISA operations to a different format for use internally. ARM's is closer to their ISA instructions but it's not clear that that makes a big difference. The biggest issue in practice at the high end might memory ordering constraints with x86 being very tight and ARM being very loose. But I can't actually say for sure.

But this is probably why Intel's efforts to extend x86 down to lower power processors haven't been huge successes.


Keep in mind that if you drop the old stuff, you'll have to provide a bootloader and even then it'll be weird; modern systems still boot in 16 bits, bootstrap in 32 bit mode, then configure 64 bit. Unless UEFI fixed that madness?


The processor itself still boots up in 16-bit mode. UEFI does switch into 64-bit mode before booting an OS loader, but that doesn't mean the processor itself starts in 64-bit mode.


UEFI starts you off in protected mode or long mode with identity paging. It even lets you package up your loader as a PE file.


UEFI can drop you directly in 64bit mode.

The problem is it uses a different memory map than 32bit mode so a lot of write your own os manuals are just flat wrong.


BIOS itself can be implemented as 32-bit code, which contains software emulator of 16-bit code. This emulator runs until CPU is switched to 32-bit protected mode by OS bootloader.


Yes it did.


If by "modern" you mean "EFI" you are incorrect.

A traditional BIOS is hardly suitable for describing a "modern" firmware; for chrissake it doesn't even use 32 bits!


Right, I know it's not intentional. My point is that this bug has unintended benefits. :)


This a thousand times. Also, its usually Intel's policy to have a (very) strong backwards compatiblity. See the A20 line for a ridiculous example. But AMD has always been less strict on this issues, and I'm glad they aren't. This is good news.

And despite all that, as you say, a new architecture is badly needed.


One of the big selling points of x86 is backwards compatibility. If you have some OS from 1990 you can still run it (without emulation or virtualization, so long as it doesn't depend on clock speed), which is pretty crazy.

Slight aside, there are a lot of reasons that Itanium failed, but certainly one of them was lack of backwards compatibility.


Itanium was extraordinarily backwards incompatible. There's an enormous gulf between "runs software from the 80s" (which is something the PC platform only pretends to do anyway, because peripherals now are incompatible) and "can't run Windows at all". Breaking v8086 mode wouldn't prevent modern Windows from working (which is in fact why this bug wasn't noticed). You can't even enter it from long mode to begin with!


actually my windows 10 installation didn't used uefi. (I had a really old machine). Basically I upgrade/upgraded to ryzen. I don't think it will be easy to migrate to UEFI straight.


> Slight aside, there are a lot of reasons that Itanium failed, but certainly one of them was lack of backwards compatibility.

Itanium did not aim at the x86 market. The x86 translation layer was retrospectively seen as a mistake as well, because it wasn't relevant, but required transistors that limited the design's performance overall, which was relevant.


> Itanium did not aim at the x86 market.

Maybe not. But x86 certainly took over the market Itanium was aiming for.


x86 did not. x64 did.


All 16 and 32 bit x86 code is valid on x64. It's an extension, not a new ISA.


You got it all backwards. AMD64 is a complete revamp which happens to support x86 in its legacy mode.

Server people care for 64-bit address spaces, and that's a feature introduced by AMD64 which is not available in x86.


This is also one of the main reasons why Mainframes still exist.

Backwards compatibility straight back to 1964 is a big deal, there's lots of 50+ year old code still in production at banks, insurers, and the like.


Pedantic point, but you couldn't really run an OS from 1990. It wouldn't support any modern peripheral buses required for normal operation.

But you certainly can natively run user-mode 16-bit DOS programs on a modern CPU.


I find it somewhat ironic (apropos?) that this article is on OS/2 Museum.


Is there any decent estimate as to how much silicon you could expect to save by dropping stuff that was long ago shunted to microcode?


It's not so much silicon as engineering and maintenance time keeping all that stuff working.


> As incredible as it is, Ryzen has buggy VME implementation; specifically, the INT instruction is known to misbehave in V86 mode with VME enabled when the given vector is redirected

It's much more incredible to me that our computers work, at all.


As incredible as it is, Ryzen has buggy VME implementation

What I find more incredible is that this bug could get by without being noticed. A CPU literally has billions of possible regression tests --- all the world's existing software --- and of everyone working on the project, not a single one thought to try some older software (XP/2k3 is not even that old, as far as x86 compatibility is concerned) to see if it worked? This is an old feature too, meaning it should've been well-characterised by now. I'm particularly surprised that FreeDOS is affected, since it's commonly used as a minimal "non-OS" OS for running things like low-level diagnostics and debugging of hardware.

This begs the question: if old features are this broken, what about the new ones (for which there is far less software available to test them with)? I think the most recently discovered one was https://news.ycombinator.com/item?id=13924192


> This begs the question: if old features are this broken, what about the new ones

You can find so called "specification updates", which - as the name implies - update the specs to match actually released hardware ;)

Available for all CPU families from both Intel and AMD, easily go into tens or hundreds of positions. (Though I haven't seen the Ryzen one released yet).

And then somebody recently linked this (2010) - allegedly there are bugs exploitable for privilege escalation:

http://cs.dartmouth.edu/~sergey/cs258/2010/D2T1%20-%20Kris%2...


> Transfer of the file you were trying to download or upload has been blocked in accordance with company policy. Please contact your system administrator if you believe this is in error.

That comes from their end.


Weird, maybe you are IP banned or it actually is some corp firewall on your side.

It's a conference presentation titled "Remote Code Execution through Intel CPU Bugs" by Kris Kaspersky and Alice Chang. Google finds copies elsewhere.

I can't say that I see how the "remote" part could possibly work, but as for local exploitation, errata often state that things like "data corruption" or "unpredictable behavior" can happen under "certain internal conditions" so this stuff may be exploitable if one can execute arbitrary instructions which trigger these internal conditions.



Old features, especially this old, fossil-level old, are just not used in real life. Modern software never enters this ancient mode for any reason. Ancient software is emulated in software again, e.g. in DOSbox.

New features are actively used, and thus actively tested. They are likely much less broken than disused old ones.


DOSBox can't run various firmware update and low-level hardware diagnostics tools that are still used in real life under DOS running on bare metal.

And Windows XP might be out of support but it still is used in some places too. And even if it wasn't, somebody could still think of using it as a test case to increase coverage. It would be extremely lame if some bug which crashes newer Windows in 0.1% of cases turned out to be trivially detectable in XP.


As we have learned today, the UK NHS is very much still on XP.


They probably aren't gonna buy Ryzens.

IIRC Ryzen and Kaby Lake were both announced with "only Windows 10 is going to be officially supported"…


Ah but XP isn't supported on anything, and the update-blocking code doesn't affect you when there are no updates.

So no reason not to use Ryzen.


Yes, they will have to buy Intel to avoid this bug.


But this bug literally couldn't crash any modern OS as the feature is not available in 64-bit mode.


The thing about regression tests, is that this mistake won't be made again. Now AMD will add these tests. For now, they may issue microcode updates or workaround patches for popular VM software.

You don't catch ever bug with a suite of unit tests. But automated regression tests do ensure you don't replicate a failure condition.


Yeah, modern CPUs have tons of bugs in the obscure corners of the architecture. The x86 boot process is an amazing amalgamation of all the legacy CPUs of the past two decades. This does seem fixable in microcode though, so presumably they'll just do that. I very much agree with the other comments though that at some point we should just get rid of all of that junk and use software emulation.


So if I get this correctly, as long as the Host OS is 64bit we're fine, since VME isn't supported on that anyways? I'm thinking 32bit hosts running VMs of any sort should be an increasingly rare case, but nevertheless it's going to be interesting to see if and when AMD releases a fixed version.


It can affect 32-bit guest OSes running on 64-bit hosts, and that is how they discovered the bug.


A 64-bit host, running a 32-bit guest, itself running a 16-bit app. This scenario can run into the VME bug.


Any exploitation scenarios here?

Could you package a trimmed down version of that stack up and cause a reliable crash on Ryzens operating on a more typical platform?

Maybe it becomes a bit of a rube goldberg malware at that point...


The first part of that is optional, and the last part of that is also optional.


Or a 64bit host running a WinXP guest.


Am I the only one who thinks that it’s not particularly incredible that a new CPU that implements the ridiculously complex (for good reasons, but still) x86 instruction set with all its historical baggage has bugs?

Every x86 has had errata. Why should we expect Ryzen to be any different? “As incredible as it is...” seems a bit of an over-reaction to me.


People think that the world a is a prefect black and white place where everything that works is prefect and everything else is garbage.

Typically, the person reporting this sort of %^&* has an agenda, ranging from "fixit fixit fixit" to "I only buy the competition and so should you". The knowledge is good to have you just have if you are into doing crazy old things with new hardware. But the hyperbolic "the world is ending" bit you should just ignore.


In other news, crank-to-start mode is broken on the Tesla Model S.


Linux should be immune simply because it doesn't use VME.


Windows doesn't use it either. It's not possible to use it on a processor running in long mode except within a virtual machine (or by putting the processor into actual protected mode).


What if you run DOS in a Virtual container though?


You can use 16 bit Win 3.1 apps on 64 bit Linux with Wine.


I think dosemu does?


DOSEMU uses virtual 8086 mode but not VME. The Linux kernel never bothered implementing VME.


Ryzen 7s cannot seem to find their level, I wonder if this will drive the price down even further, or if it's too specific for anyone to notice.

https://pcpartpicker.com/product/9Q98TW/amd-ryzen-7-1700x-34...


AMD should just drop the support for VME. There is no need to carry such baggage in 2017. If you need to run your legacy systems that require VME: buy yourself a processor designed in 2016.


Ryzen is amazing, I hope AMD fixes this via BIOS update for motherboards, so the updated microcode can be loaded before the OS boots.


They will issue patch for processor that will fix it. Just another bug.


I misread VME as VMX at first. Totally different.


This really should have "(virtual 8086 mode)" in the title.


That would be inaccurate. VME is an extension of virtual 8086 mode.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: