Thank you! I've been waiting for a viable RVV board for a long time. Just ordered the OrangePi RV2.
This unblocks me properly working to optimize for vector support in software. OOO and even wider RVV registers will then automatically speed things up, without even a recompile.
Yes, I know I could use qemu, but it's not the same. I feel like this is what unblocks me on the software side.
> OOO and even wider RVV registers will then automatically speed things up, without even a recompile.
The problem is that there are some things in RVV where it's unclear how they will perform on high perf OoO cores:
* general choice of LMUL: on in-order cores it's clear that maximizing LMUL without spilling is the best approach, for OoO this isn't clear.
* How will LMUL>1 vrgather and vcompress perform?
* How high is the impact of vsetvli instructions? Is it worth trying to move them outside of loops whenever possible, or is the impact minimal like in the current in-order implementations.
* What is the overhead of using .vx instruction variants, is there additional cost involved in moving between GPRs and vector registers?
* Is there additional overhead when reinterpreting vector masks?
* What performance can we expect from the more complex load/stores, especially the segmented ones.
This unblocks me properly working to optimize for vector support in software. OOO and even wider RVV registers will then automatically speed things up, without even a recompile.
Yes, I know I could use qemu, but it's not the same. I feel like this is what unblocks me on the software side.