Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Thank you! I've been waiting for a viable RVV board for a long time. Just ordered the OrangePi RV2.

This unblocks me properly working to optimize for vector support in software. OOO and even wider RVV registers will then automatically speed things up, without even a recompile.

Yes, I know I could use qemu, but it's not the same. I feel like this is what unblocks me on the software side.



> OOO and even wider RVV registers will then automatically speed things up, without even a recompile.

The problem is that there are some things in RVV where it's unclear how they will perform on high perf OoO cores:

* general choice of LMUL: on in-order cores it's clear that maximizing LMUL without spilling is the best approach, for OoO this isn't clear.

* How will LMUL>1 vrgather and vcompress perform?

* How high is the impact of vsetvli instructions? Is it worth trying to move them outside of loops whenever possible, or is the impact minimal like in the current in-order implementations.

* What is the overhead of using .vx instruction variants, is there additional cost involved in moving between GPRs and vector registers?

* Is there additional overhead when reinterpreting vector masks?

* What performance can we expect from the more complex load/stores, especially the segmented ones.

The LLVM scheduling models give some insight:

* SiFive P670: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Targ...

* Tenstorrent Ascalon: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Targ... (still missing the vector part, but there is supposed to be a PR in the near future)

I'm trying to collect as much info on hardware as I can: https://camel-cdr.github.io/rvv-bench-results/index.html




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: