[llvm] [RISCV] Lower SEW<=32 vector_deinterleave(2) via vunzip2{a, b} (PR #136463)
Luke Lau via llvm-commits
llvm-commits at lists.llvm.org
Sun Apr 20 20:35:04 PDT 2025
================
@@ -254,105 +351,175 @@ ret {<vscale x 8 x i64>, <vscale x 8 x i64>} %retval
; Floats
define {<vscale x 2 x bfloat>, <vscale x 2 x bfloat>} @vector_deinterleave_nxv2bf16_nxv4bf16(<vscale x 4 x bfloat> %vec) {
-; CHECK-LABEL: vector_deinterleave_nxv2bf16_nxv4bf16:
-; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli a0, zero, e16, mf2, ta, ma
-; CHECK-NEXT: vnsrl.wi v10, v8, 0
-; CHECK-NEXT: vnsrl.wi v9, v8, 16
-; CHECK-NEXT: vmv1r.v v8, v10
-; CHECK-NEXT: ret
+; V-LABEL: vector_deinterleave_nxv2bf16_nxv4bf16:
+; V: # %bb.0:
+; V-NEXT: vsetvli a0, zero, e16, mf2, ta, ma
+; V-NEXT: vnsrl.wi v10, v8, 0
+; V-NEXT: vnsrl.wi v9, v8, 16
+; V-NEXT: vmv1r.v v8, v10
+; V-NEXT: ret
+;
+; ZIP-LABEL: vector_deinterleave_nxv2bf16_nxv4bf16:
+; ZIP: # %bb.0:
+; ZIP-NEXT: vsetvli a0, zero, e16, m1, ta, ma
+; ZIP-NEXT: ri.vunzip2a.vv v10, v8, v9
+; ZIP-NEXT: ri.vunzip2b.vv v9, v8, v11
+; ZIP-NEXT: vmv.v.v v8, v10
+; ZIP-NEXT: ret
%retval = call {<vscale x 2 x bfloat>, <vscale x 2 x bfloat>} @llvm.vector.deinterleave2.nxv4bf16(<vscale x 4 x bfloat> %vec)
ret {<vscale x 2 x bfloat>, <vscale x 2 x bfloat>} %retval
}
define {<vscale x 2 x half>, <vscale x 2 x half>} @vector_deinterleave_nxv2f16_nxv4f16(<vscale x 4 x half> %vec) {
-; CHECK-LABEL: vector_deinterleave_nxv2f16_nxv4f16:
-; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli a0, zero, e16, mf2, ta, ma
-; CHECK-NEXT: vnsrl.wi v10, v8, 0
-; CHECK-NEXT: vnsrl.wi v9, v8, 16
-; CHECK-NEXT: vmv1r.v v8, v10
-; CHECK-NEXT: ret
+; V-LABEL: vector_deinterleave_nxv2f16_nxv4f16:
+; V: # %bb.0:
+; V-NEXT: vsetvli a0, zero, e16, mf2, ta, ma
+; V-NEXT: vnsrl.wi v10, v8, 0
+; V-NEXT: vnsrl.wi v9, v8, 16
+; V-NEXT: vmv1r.v v8, v10
+; V-NEXT: ret
+;
+; ZIP-LABEL: vector_deinterleave_nxv2f16_nxv4f16:
+; ZIP: # %bb.0:
+; ZIP-NEXT: vsetvli a0, zero, e16, m1, ta, ma
+; ZIP-NEXT: ri.vunzip2a.vv v10, v8, v9
+; ZIP-NEXT: ri.vunzip2b.vv v9, v8, v11
----------------
lukel97 wrote:
Just an observation not related to this PR, v9 and v11 are undef right? Could regalloc have chosen `ri.vunzip2a.vv v10, v8, v8; ri.vunzip2a.vv v9, v8, v8` to reduce the number of dependencies
https://github.com/llvm/llvm-project/pull/136463
More information about the llvm-commits
mailing list