[llvm] [RISCV] Add codegen support for ri.vinsert.v.x and ri.vextract.x.v (PR #136708)
Philip Reames via llvm-commits
llvm-commits at lists.llvm.org
Thu Apr 24 10:14:30 PDT 2025
================
@@ -56,6 +86,13 @@ define <32 x i32> @insertelt_v32i32_4(<32 x i32> %a, i32 %y) {
; CHECK-NEXT: vmv.s.x v16, a0
; CHECK-NEXT: vslideup.vi v8, v16, 4
; CHECK-NEXT: ret
+;
+; VISNI-LABEL: insertelt_v32i32_4:
+; VISNI: # %bb.0:
+; VISNI-NEXT: li a1, 32
+; VISNI-NEXT: vsetvli zero, a1, e32, m2, tu, ma
+; VISNI-NEXT: ri.vinsert.v.x v8, a0, 4
----------------
preames wrote:
Ok, this is weird. However, it's not incorrect, and is a quirk in the existing code too.
Take a look at the VL for insertelt_v32i32_0 above (the vmv.s.x case). We set the same AVL=32 there as well.
The reason it's correct is that the original VL of the whole vector must be a power of two (legalization), and thus our choice to use a smaller prefix vector results in VLMAX either exceeding that VL (for high zvlNb values) or being at least 1/2 smaller (for low zvlNb) values. That means that the resulting VL after the vsetvli is actually 8 in this case, not 32 (on a zvl128b machine).
We probably should be using AVL=1 in both cases, and letting InsertVSETVLI pick a profitable one. A future chang will do this for both.
https://github.com/llvm/llvm-project/pull/136708
More information about the llvm-commits
mailing list