[llvm] [RISCV]Do not combine to 'vw' if the number of extended instructions cannot be reduced (PR #159715)
Luke Lau via llvm-commits
llvm-commits at lists.llvm.org
Fri Sep 19 02:29:57 PDT 2025
================
@@ -184,11 +181,10 @@ entry:
define i32 @vdotqsu_vx_constant(<vscale x 16 x i8> %a) {
; CHECK-LABEL: vdotqsu_vx_constant:
; CHECK: # %bb.0: # %entry
-; CHECK-NEXT: vsetvli a0, zero, e16, m4, ta, ma
-; CHECK-NEXT: vsext.vf2 v16, v8
+; CHECK-NEXT: vsetvli a0, zero, e32, m8, ta, ma
+; CHECK-NEXT: vsext.vf4 v16, v8
; CHECK-NEXT: li a0, 123
-; CHECK-NEXT: vwmul.vx v8, v16, a0
-; CHECK-NEXT: vsetvli zero, zero, e32, m8, ta, ma
+; CHECK-NEXT: vmul.vx v8, v16, a0
----------------
lukel97 wrote:
I think the vsext.vf4 is more expensive too. At least on the Banana Pi F3 from https://camel-cdr.github.io/rvv-bench-results/bpi_f3/index.html, this is now 16.00 (vsext.vf4) + 8.00 (vmul.vx) total cycles. Previously it was 8.00 (vsext.vf2) + 8.00 (vwmul.vx) = 16.0.
https://github.com/llvm/llvm-project/pull/159715
More information about the llvm-commits
mailing list