It's more than just limiting the number of possible instruction lengths; it's also that you only need the first few bits of the instruction to determine its length. With x86, you have to decode the first byte to know if the instruction has more bytes, decode these bytes to know if the instruction has even more bytes, and so on.
But since I'm not a hardware designer, I don't know if the RISC-V design is enough to make a high-performance wide decoder. With 64-bit ARM it seems very easy; once you loaded a n-byte line from the cache, the first decoder gets the first four bytes, the second decoder gets the next four bytes, and so on. With compressed RISC-V, the first decoder gets the first four bytes (0-3); the second decoder gets either bytes 4-7 or bytes 2-5; the third decoder can get bytes 8-11, 6-9, or 4-7, depending on how many of the preceding instructions were compressed; and so on. Determining which bytes each decoder gets seems very easy (it's a simple boolean formula of the first two bits of each pair of bytes), but I don't know enough about hardware design to know if the propagation delay from this slows things down enough to need an extra pipeline step once the decode gets too wide (for instance, the eighth decoder would have 8 possible choices for its input), or if there are tricks to avoid this delay (similar to a carry-skip adder).
I am a hardware designer. See my comment above, it's going to be ugly. Fast adders are still slow (and awkward), but they only have to propagate a single bit of information, this is much messier.
As a side-note carry-skip doesn't really work in modern VLSI, I guess you were probably thinking of carry-lookahead.
Ok so you are saying not only are there implicit super instructions via macro op fusion there are also variable length instructions in there too? Ok, I'm not a RISC-V expert but damn that kind of ruins even the last tiny shred of its original value proposition of being simple. Sure the core instruction set is easy but once you add extensions it's just plain ugly.
But since I'm not a hardware designer, I don't know if the RISC-V design is enough to make a high-performance wide decoder. With 64-bit ARM it seems very easy; once you loaded a n-byte line from the cache, the first decoder gets the first four bytes, the second decoder gets the next four bytes, and so on. With compressed RISC-V, the first decoder gets the first four bytes (0-3); the second decoder gets either bytes 4-7 or bytes 2-5; the third decoder can get bytes 8-11, 6-9, or 4-7, depending on how many of the preceding instructions were compressed; and so on. Determining which bytes each decoder gets seems very easy (it's a simple boolean formula of the first two bits of each pair of bytes), but I don't know enough about hardware design to know if the propagation delay from this slows things down enough to need an extra pipeline step once the decode gets too wide (for instance, the eighth decoder would have 8 possible choices for its input), or if there are tricks to avoid this delay (similar to a carry-skip adder).