> it's a matter of the wrong configuration unlike x86 where it is a matter of the wrong instruction
+1 to dzaima's mention of vrgather. The lack of fixed-pattern shuffle instructions in RVV is absolutely a wrong-instruction issue.
I agree with your point that multiple code variants + runtime dispatch are helpful. We do this with Highway in particular for x86. Users only write code once with portable intrinsics, and the mess of instruction selection is taken care of.
> +1 to dzaima's mention of vrgather. The lack of fixed-pattern shuffle instructions in RVV is absolutely a wrong-instruction issue.
What others would you want? Something like vzip1/2 would make sense, but that isn't much of an permutation, since the input elements are exctly next to the output elements.
+1 to dzaima's mention of vrgather. The lack of fixed-pattern shuffle instructions in RVV is absolutely a wrong-instruction issue.
I agree with your point that multiple code variants + runtime dispatch are helpful. We do this with Highway in particular for x86. Users only write code once with portable intrinsics, and the mess of instruction selection is taken care of.