[llvm] [RISCV][VLOPT] Add support for checkUsers when UserMI is a Single-Width Integer Reduction (PR #120345)

Sun Jan 5 07:17:05 PST 2025

================
@@ -149,8 +152,11 @@ define i32 @mul_i32(<4 x i32> %a, <4 x i32> %b) {
 ; RV64-NEXT:    vslidedown.vi v10, v9, 2
 ; RV64-NEXT:    vmul.vv v9, v9, v10
 ; RV64-NEXT:    vrgather.vi v10, v8, 1
+; RV64-NEXT:    vsetivli zero, 1, e32, m1, ta, ma
 ; RV64-NEXT:    vmul.vv v8, v8, v10
+; RV64-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
 ; RV64-NEXT:    vrgather.vi v10, v9, 1
+; RV64-NEXT:    vsetivli zero, 1, e32, m1, ta, ma
----------------
lukel97 wrote:

> We'd be reducing them incorrectly if we didn't add it to isVectorOpUsedAsScalarOp.

Ah I see the reasoning now. 

I think we're ending up with the extra VL toggles because we're introducing an artificial VL=1 requirement when vmv.x.s doesn't read VL. 

If we move the !hasVLOp check above the isVectorOpUsedAsScalarOp check then it looks like we can avoid the regressions, and I think that should still be conservatively correct. It also means we don't need to special case vmv.x.s either:

```diff

diff --git a/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp b/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp
index e29907b8197c..711f570faf36 100644
--- a/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp
+++ b/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp
@@ -965,6 +965,12 @@ RISCVVLOptimizer::getVLForUser(MachineOperand &UserOp) {
   const MachineInstr &UserMI = *UserOp.getParent();
   const MCInstrDesc &Desc = UserMI.getDesc();
 
+  if (!RISCVII::hasVLOp(Desc.TSFlags) || !RISCVII::hasSEWOp(Desc.TSFlags)) {
+    LLVM_DEBUG(dbgs() << "    Abort due to lack of VL, assume that"
+                         " use VLMAX\n");
+    return std::nullopt;
+  }
+
   // Instructions like reductions may use a vector register as a scalar
   // register. In this case, we should treat it like a scalar register which
   // does not impact the decision on whether to optimize VL. But if there is
@@ -977,11 +983,6 @@ RISCVVLOptimizer::getVLForUser(MachineOperand &UserOp) {
     assert(RISCV::VRRegClass.hasSubClassEq(RC) &&
            "Expect LMUL 1 register class for vector as scalar operands!");
     LLVM_DEBUG(dbgs() << "    Used this operand as a scalar operand\n");
-    // VMV_X_S and VFMV_F_S do not have a VL operand which would cause an assert
-    // failure if we called getVLOpNum. Therefore, we will return 1 in this
-    // case, even if it could have been set to 0.
-    if (!RISCVII::hasVLOp(Desc.TSFlags) || !RISCVII::hasSEWOp(Desc.TSFlags))
-      return MachineOperand::CreateImm(1);
 
     unsigned VLOpNum = RISCVII::getVLOpNum(Desc);
     const MachineOperand &VLOp = UserMI.getOperand(VLOpNum);
@@ -993,12 +994,6 @@ RISCVVLOptimizer::getVLForUser(MachineOperand &UserOp) {
     return std::nullopt;
   }
 
-  if (!RISCVII::hasVLOp(Desc.TSFlags) || !RISCVII::hasSEWOp(Desc.TSFlags)) {
-    LLVM_DEBUG(dbgs() << "    Abort due to lack of VL, assume that"
-                         " use VLMAX\n");
-    return std::nullopt;
-  }
-
   unsigned VLOpNum = RISCVII::getVLOpNum(Desc);
   const MachineOperand &VLOp = UserMI.getOperand(VLOpNum);
   // Looking for an immediate or a register VL that isn't X0.
```

I agree now it makes sense to mark the vmv.x.s operands as being scalar ops, but if we shuffle the checks around then that part should be NFC and we avoid the extra vsetvlis.

https://github.com/llvm/llvm-project/pull/120345