NVIDIA's Denver is a feat; I think that if it says anything, it's that ARM's instruction encoding is deficient. AArch64 instruction encoding is less dense than AMD64, and completely lacks compressed instructions (like Thumb). To NVIDIA, it made more sense for them to write a realtime binary translator for an internal ISA which is clearly enabling their microarchitecture more than ARM is.
AMD and Intel spend more money on research than you can even imagine, and it's not surprising that they manage to make x86 machines that lead in single-thread performance, but since Intel tried Itanium, you can tell their designers feel x86 is deficient. I mean, just imagine how much of their chips is just decode.
Also, not to get too childish, but ironically NVIDIA is one of the vendors who is shipping RISC-V in a lot of products. Soon enough, that Denver-style core is going to be sitting on a die right next to a RISC-V. ;- )
That's a good point about Nvidia adopting RISC-V. I didn't know that but for their application it's a good idea.
I'm not a Denver fan and at this point I think we can safely say Code Morphing has been tried. However, I wouldn't conclude that Denver means that ARM's instruction encoding is deficient. The optimizations that Denver provides would be impractical in any ISA.
Moreover architectures are meant to be implemented+optimized in many ways. You may not like x86 but planting that flag in the sand allowed Intel to innovate on the microarchitecture side while developers innovated on the application side knowing that x86 would still be there. It's the same value proposition that IBM offered with the original architecture, System 360.
Where I like x86 and where I don't like RISC-V is that x86 doesn't try to be perfect but rather to adapt over time. RISC-V tries to be perfect and indeed eschews many of the bad RISC ideas from the past (register windows, branch delay slots, tagged integers, ...) while then pig headedly avoiding an obvious feature shared by both x86 and ARM, condition codes. I've read their argument and found it unconvincing. Original ARM had way too much condition code support. I think ARMv8 strikes a nice balance that RISC-V should have followed.
The HP+Intel Itanium effort allowed AMD to propose their AMD64 extensions. That's been quite successful. I wish they'd taken the opportunity to trim more cruft. When ARM went 64b with ARMv8, they took that opportunity and the result is quite clean. I prefer it to RISC-V although I haven't written any RISC-V assembly.
ARMv8 tries to be a good instruction set architecture. To me, RISC-V tries to be the platonic ideal RISC ISA. I'll go with good.
Yeah, I agree that condition codes (and offset loads) are features, not bugs in x86 and ARM; also ARMv8 shows an effort to reduce the number of instructions which can respond to condition codes. Chris Celio has talked about some interesting ways to make up for the lack of these two features, and it seems quite convincing. If you're using the compressed instruction set (which all desktop/workstation type RISC-Vs and most microcontrollers support), then the decoder can implicitly fuse the add and the load, or various forms of conditions, and treat them as a single operation. AFAIK, the compiler backends already try to order the instructions to exploit this.
And yeah, I'm not utterly convinced by any argument about "purity" in ISAs, but in this case there's no question that it has helped a wider variety of people develop interesting and competitive chips in less time.
ARMv8 is a considerable step up in many ways from ARMv7 and earlier, but AArch64 retains user mode compatibility with ARMv7, which means that the more different AArch64 is, the harder it is to implement. In this way, every ARMv8 is also an ARMv7 (of course, sans all the supervisor/hypervisor instructions).
In many ways I like x86, and for the most part I like the vendors. I love that x86 has given Intel and AMD the opportunity to innovate so dramatically.
But just for a moment, imagine that instead of just Intel and AMD, the whole industry can put that same flag in the sand and have just one general purpose instruction set family.
You could have an 8088, a Cortex M0-M4, an ARC, an AVR, a PicoBlaze, a MicroBlaze, a SuperH, a MIPS, a Power, a LatticeMico, etc. but many of these architectures survive today because of a differential in licensing cost with ARM, not because of any technical prowess (and some of them are better in some ways, don't get me wrong!). Imagine that for the vast majority of these people, one ISA family would suffice, and the whole market could compete to bring new performance, power, and cost profiles to each market served by these cores. Then imagine that that same industry can easily start scaling up their designs to compete in the application processor market, and then perhaps in the workstation and server market, then perhaps the HPC market.
Just a thought though, I can't predict the future with any degree of certainty. I just think RISC-V is a whole lot more practical than you might think, just perhaps for people who have slightly different values from you (or from me, for that matter).
I think there's a lot of promise in that it is becoming the standard teaching ISA for universities throughout the United States, Canada, India, and elsewhere.
If there is a generation of new computer engineers coming out of school with research-grade FPGAs in their hands, and their thesis work can be commercialized in a matter of months rather than years, then you can imagine that there will be huge commercial output in RISC-V whether it catches on now or then.
I think that's when it will start to seem more attractive to you, there will be more investment in it and you can see some clear, immediate benefit aside from cost savings and licensing flexibility.
Based on these photos, I’d estimate instruction decoder takes about 10% of each core, and about 7% of the whole chip.
For Intel I was unable to find similar pics, but my estimation is 3-4% of the chip area. The instruction set is the same; the complexity should be comparable. But most Intel chips have like 50% of the area occupied by integrated graphics.
The micro-op cache size increased to 2048 from Intel's 1536 μops. His testing shows 5 instructions per clock cycle up from Intel's 4. There are limits to the ILP a scheduler can find; more in one area means more demand in another. This is no mean feat for AMD to pull off.
AMD and Intel spend more money on research than you can even imagine, and it's not surprising that they manage to make x86 machines that lead in single-thread performance, but since Intel tried Itanium, you can tell their designers feel x86 is deficient. I mean, just imagine how much of their chips is just decode.
Also, not to get too childish, but ironically NVIDIA is one of the vendors who is shipping RISC-V in a lot of products. Soon enough, that Denver-style core is going to be sitting on a die right next to a RISC-V. ;- )