[llvm] [RISCV] Add codegen support for ri.vinsert.v.x and ri.vextract.x.v (PR #136708)

Thu Apr 24 10:14:30 PDT 2025

================
@@ -56,6 +86,13 @@ define <32 x i32> @insertelt_v32i32_4(<32 x i32> %a, i32 %y) {
 ; CHECK-NEXT:    vmv.s.x v16, a0
 ; CHECK-NEXT:    vslideup.vi v8, v16, 4
 ; CHECK-NEXT:    ret
+;
+; VISNI-LABEL: insertelt_v32i32_4:
+; VISNI:       # %bb.0:
+; VISNI-NEXT:    li a1, 32
+; VISNI-NEXT:    vsetvli zero, a1, e32, m2, tu, ma
+; VISNI-NEXT:    ri.vinsert.v.x v8, a0, 4
----------------
preames wrote:

Ok, this is weird.  However, it's not incorrect, and is a quirk in the existing code too.

Take a look at the VL for insertelt_v32i32_0 above (the vmv.s.x case).  We set the same AVL=32 there as well.

The reason it's correct is that the original VL of the whole vector must be a power of two (legalization), and thus our choice to use a smaller prefix vector results in VLMAX either exceeding that VL (for high zvlNb values) or being at least 1/2 smaller (for low zvlNb) values.  That means that the resulting VL after the vsetvli is actually 8 in this case, not 32 (on a zvl128b machine).  

We probably should be using AVL=1 in both cases, and letting InsertVSETVLI pick a profitable one.  A future chang will do this for both.  


https://github.com/llvm/llvm-project/pull/136708