[llvm] [RISCV]Do not combine to 'vw' if the number of extended instructions cannot be reduced (PR #159715)

Fri Sep 19 02:29:57 PDT 2025

================
@@ -184,11 +181,10 @@ entry:
 define i32 @vdotqsu_vx_constant(<vscale x 16 x i8> %a) {
 ; CHECK-LABEL: vdotqsu_vx_constant:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    vsetvli a0, zero, e16, m4, ta, ma
-; CHECK-NEXT:    vsext.vf2 v16, v8
+; CHECK-NEXT:    vsetvli a0, zero, e32, m8, ta, ma
+; CHECK-NEXT:    vsext.vf4 v16, v8
 ; CHECK-NEXT:    li a0, 123
-; CHECK-NEXT:    vwmul.vx v8, v16, a0
-; CHECK-NEXT:    vsetvli zero, zero, e32, m8, ta, ma
+; CHECK-NEXT:    vmul.vx v8, v16, a0
----------------
lukel97 wrote:

I think the vsext.vf4 is more expensive too. At least on the Banana Pi F3 from https://camel-cdr.github.io/rvv-bench-results/bpi_f3/index.html, this is now 16.00 (vsext.vf4) + 8.00 (vmul.vx) total cycles. Previously it was 8.00 (vsext.vf2) + 8.00 (vwmul.vx) = 16.0.

https://github.com/llvm/llvm-project/pull/159715