> Flags don't have to add an extra implicit input/output everywhere. Both ARM and PowerPC avoid updating the flags unless explicitly requested.
You mean things like having variants of common arithmetic instructions that update or don't update flags?
> Besides fixed-size instructions and the traditional variable-size instructions, one can do variable-size instructions in bundles. An example would be 25-bit and 50-bit instructions packed into 128-bit bundles, with the remaining 3 bits used to specify all the sizes. (eight patterns: nnnnn, nww, wnw, wwn, nnnw, wwnw, wnww, wnnn) Extending that out to a typical cache line of 512 bits might be better. Another option is to use 1 of every 16 bits to indicate where instructions start.
Yeah, something like that could be nice. Though how would jump instructions be encoded? Bundle + offset within bundle?
> Where RISC-V got wasteful was the registers. Compilers are seldom able to use anywhere near 32 registers. On normal code, normal compilers seem to need about 8 to 10 registers free after deducting the ones reserved by the ABI. The ABI might need 3 to 5 registers. (stack, PLT, GOT, TLS, etc.) That means that roughly 11 to 15 registers are needed. Clearly, 4 bits (16 registers) is enough. Shoving some of those ABI-reserved registers out of the general-purpose set wouldn't be a bad idea; most of those are just used for addressing.
Nah, I think 32 registers was a good choice. (Relatively) common loop optimizations like unrolling or pipelining need more registers. Also, some of those registers are callee saved and some are call clobbered; by making use of this information the compiler can avoid spilling and reloading of registers around function calls.
For x86-64 16 registers is fine, partly because in many cases one can operate directly on memory without needing to explicitly load/store to architectural registers, and partly because the target was and is OoO cores that aren't as dependent on those register-consuming compiler optimizations.
It is common to have a bit which causes an instruction to update flag bits. PowerPC arithmetic instructions have an "Rc" field, usually the LSB, indicated by a trailing "." in the assembly syntax. ARM arithmetic instructions have an "S" field, usually bit 20, indicated by a trailing "S" in the assembly syntax.
Bundle + offset is fine. The offsets don't need to be real. In the example given with 25-bit and 50-bit instructions, allowable low nibbles of instruction addresses might be: 0 1 2 3 4 (so it goes 0x77777773, 0x77777774, 0x77777780, 0x77777781, etc.)
I disassemble binary executables as my full-time job. I've dealt with over a dozen different architectures. I commonly deal with PowerPC, ARM, MIPS, x86-64, and ColdFire. The extra registers of PowerPC and MIPS are always wasted. Even with ARM and x86-64, unused registers are the norm. It simply isn't normal for a compiler to be able to make effective use of lots of registers. Surely there is an example somewhere that I haven't yet seen, but that would be highly abnormal code.
If more registers could be used by compilers, the Itanium would have been a success.
I'm not an expert so pardon any ignorance, but couldn't compilers be acting conservative about registers due to the "long shadow" of x86? Perhaps the modest increase of 8 registers for x64 didn't cause compiler developers to ever start considering registers as a generally abundant resource, thus constraining their designs.
> If more registers could be used by compilers, the Itanium would have been a success.
I kind of feel the Itanic never made it far enough for its register count to have mattered to anyone. I wonder if SPARC would be a better comparison... it was somewhat popular in the 90's and 00's, and was a RISC chip with oodles of registers, wasn't it?
You mean things like having variants of common arithmetic instructions that update or don't update flags?
> Besides fixed-size instructions and the traditional variable-size instructions, one can do variable-size instructions in bundles. An example would be 25-bit and 50-bit instructions packed into 128-bit bundles, with the remaining 3 bits used to specify all the sizes. (eight patterns: nnnnn, nww, wnw, wwn, nnnw, wwnw, wnww, wnnn) Extending that out to a typical cache line of 512 bits might be better. Another option is to use 1 of every 16 bits to indicate where instructions start.
Yeah, something like that could be nice. Though how would jump instructions be encoded? Bundle + offset within bundle?
> Where RISC-V got wasteful was the registers. Compilers are seldom able to use anywhere near 32 registers. On normal code, normal compilers seem to need about 8 to 10 registers free after deducting the ones reserved by the ABI. The ABI might need 3 to 5 registers. (stack, PLT, GOT, TLS, etc.) That means that roughly 11 to 15 registers are needed. Clearly, 4 bits (16 registers) is enough. Shoving some of those ABI-reserved registers out of the general-purpose set wouldn't be a bad idea; most of those are just used for addressing.
Nah, I think 32 registers was a good choice. (Relatively) common loop optimizations like unrolling or pipelining need more registers. Also, some of those registers are callee saved and some are call clobbered; by making use of this information the compiler can avoid spilling and reloading of registers around function calls.
For x86-64 16 registers is fine, partly because in many cases one can operate directly on memory without needing to explicitly load/store to architectural registers, and partly because the target was and is OoO cores that aren't as dependent on those register-consuming compiler optimizations.