Microprogramming is a technology which goes back to the 50s. CISCs were microprogrammed. The VAX was microprogrammed. It was the status quo circa 1980. RISC as a term really dates to 1980 (although the ideas go back to the 801 and Crays machines).
The big idea of RISC was to avoid microprogramming and have a simple to decode ISA which was directly executed by a short simple pipeline. ld/st + lots of registers + C compiler. Awesome.
Indeed in Patterson+Ditzel's version of Luther's 95 theses, they say:
Microprogrammed control allows the implementation of complex architectures more cost-effectively than hardwired control
They say this in the section Reasons For Increased Complexity. RISC is revolting from that complexity, from microprogramming, so how can μops be RISC if that's what they were revolting from?
I think people say this because no one knows what microprogramming is anymore. You might read one paper in a graduate architecture seminar. And no one but no one writes microprograms.
The decode stage of Haswell translates add RAX,RBX into a '150b' μop. If can you read Agner Fog and you'll know how many μops, what the latencies are, etc. These are empirically determined properties of the μop but that's it. You know the Decoded ICache characteristics. But you don't know the ISA.
After all that's done, you're left with maybe a 14-pipestage data path. That's not RISC. Complicated instructions (AAA) get interpreted. That's not RISC. Multiple functional units can calculate 2+3 operand effective addresses. That's not RISC. This all is REALLY not RISC.
μops are micro-instructions, not RISC instructions.
BTW, the alternate decoder kinda wouldn't work so well because the microarchitecture, functional units, datapath, caches and registers are really set up for x86. Folks wanted something like that at Transmeta, a Java processor, but the HW was designed for x86.
It may be worth mentioning that the BCD suite of instructions are not supported in x64. But if you are looking for an example of modern microcoded x64 to avoid, BTR/BTC/BTS with a memory operand is a prime example. Which does bring up the point that talking about "x86/x64" can be misleading in a similar way as discussions of the "C/C++" language.
the alternate decoder kinda wouldn't work so well because the microarchitecture, functional units, datapath, caches and registers are really set up for x86.
Are there specific areas where you see problems? For example, I'd think the existing abstraction between limited number of architectural registers and the hundreds of physical registers would work fine for alternative ISA's. And I don't immediately see why caching wouldn't work unaltered.
Folks wanted something like that at Transmeta, a Java processor, but the HW was designed for x86.
I searched for more information about what became of Transmeta after I posted last night, but didn't find much about the technical issues they encountered. Do you know if there is a good post-mortem?
While searching, I did come across a couple interesting CPU's that are going the other way:
The main problem I see mentioned when going the other way (ARM emulating x86) is dealing with "flags". Since recent Intel processors already support "flagless" variants of most of the instructions (SARX, MULX, etc) this doesn't seems like it could be insurmountable.
BTS with a memory operand. Now that would be bad. What compiler people learned was to avoid the CISC-iest instructions in favor of RISCy ld/st+registers instructions. What Intel learned was to make RISCy instructions fast. However after translation to uops, the instruction encoding itself just didn't matter.
Also, register renaming helps a LOT; so you don't need a 32 direct registers when 16 registers renamed to 180 internal registers will more than do. Register renaming predates RISC by 15 years (Tomosulo). I'd almost say it's anti-RISC.
BTW, Elbrus + Transmeta are kinda joined at the hip. Babayan consulted at Sun with Ditzel and is now at Intel. Half the Transmeta people went to NVidia (eventually) and did an x86 before switching to ARM (Denver).
Eventually people will see RISC as the provisional idea it is. Register renaming is a Great Idea. Uop translation is a Great Idea. RISC is provisional.
> The main problem I see mentioned when going the other way (ARM emulating x86) is dealing with "flags".
I see the much larger problem in ARM emulating the memory model of x86, which gives much stronger guarantees on ordering and synchronization than the weak memory model used by ARM processors.
The big idea of RISC was to avoid microprogramming and have a simple to decode ISA which was directly executed by a short simple pipeline. ld/st + lots of registers + C compiler. Awesome.
Indeed in Patterson+Ditzel's version of Luther's 95 theses, they say:
They say this in the section Reasons For Increased Complexity. RISC is revolting from that complexity, from microprogramming, so how can μops be RISC if that's what they were revolting from?I think people say this because no one knows what microprogramming is anymore. You might read one paper in a graduate architecture seminar. And no one but no one writes microprograms.
The decode stage of Haswell translates add RAX,RBX into a '150b' μop. If can you read Agner Fog and you'll know how many μops, what the latencies are, etc. These are empirically determined properties of the μop but that's it. You know the Decoded ICache characteristics. But you don't know the ISA.
After all that's done, you're left with maybe a 14-pipestage data path. That's not RISC. Complicated instructions (AAA) get interpreted. That's not RISC. Multiple functional units can calculate 2+3 operand effective addresses. That's not RISC. This all is REALLY not RISC.
μops are micro-instructions, not RISC instructions.
BTW, the alternate decoder kinda wouldn't work so well because the microarchitecture, functional units, datapath, caches and registers are really set up for x86. Folks wanted something like that at Transmeta, a Java processor, but the HW was designed for x86.