[llvm] [RISCV][VLOPT] Add support for checkUsers when UserMI is a Single-Width Integer Reduction (PR #120345)

Mon Jan 6 09:12:52 PST 2025

================
@@ -1028,79 +1055,113 @@ bool RISCVVLOptimizer::isCandidate(const MachineInstr &MI) const {
   return true;
 }
 
-bool RISCVVLOptimizer::checkUsers(const MachineOperand *&CommonVL,
-                                  MachineInstr &MI) {
+std::optional<MachineOperand>
+RISCVVLOptimizer::getVLForUser(MachineOperand &UserOp) {
+  const MachineInstr &UserMI = *UserOp.getParent();
+  const MCInstrDesc &Desc = UserMI.getDesc();
+
+  if (!RISCVII::hasVLOp(Desc.TSFlags) || !RISCVII::hasSEWOp(Desc.TSFlags)) {
+    LLVM_DEBUG(dbgs() << "    Abort due to lack of VL, assume that"
+                         " use VLMAX\n");
+    return std::nullopt;
+  }
+
+  // Instructions like reductions may use a vector register as a scalar
+  // register. In this case, we should treat it like a scalar register which
+  // does not impact the decision on whether to optimize VL. But if there is
+  // another user of MI and it may have VL=0, we need to be sure not to reduce
+  // the VL of MI to zero when the VLOp of UserOp may be non-zero. The most
+  // we can reduce it to is one.
+  if (isVectorOpUsedAsScalarOp(UserOp)) {
+    [[maybe_unused]] Register R = UserOp.getReg();
+    [[maybe_unused]] const TargetRegisterClass *RC = MRI->getRegClass(R);
+    assert(RISCV::VRRegClass.hasSubClassEq(RC) &&
+           "Expect LMUL 1 register class for vector as scalar operands!");
+    LLVM_DEBUG(dbgs() << "    Used this operand as a scalar operand\n");
+
+    unsigned VLOpNum = RISCVII::getVLOpNum(Desc);
+    const MachineOperand &VLOp = UserMI.getOperand(VLOpNum);
+    if (VLOp.isReg() || (VLOp.isImm() && VLOp.getImm() != 0))
+      return MachineOperand::CreateImm(1);
----------------
michaelmaitland wrote:

I think we're in agreement that we don't need the `isVectorOpUsedAsScalarOp` logic in getMinimumVLForUser for correctness. But we do need it to avoid the missed optimization opportunity in `vred_vl0_and_vlreg`.

Regarding some of your takeaways in the last comment:

>> we need to be sure not to reduce the VL of MI to zero when the VLOp of UserOp may be non-zero

> This implies that we may reduce MI's VL to something smaller than a user's VL.

I agree. This occurs only in the case where we would return VL=1 in the case when the scalar operand is a register or non-zero immediate. This is because we only read the first lane of the vector register.

>> Use the largest VL among all the users. If we cannot determine this statically, then we cannot optimize the VL.

> This implies that we never reduce MI's VL to something smaller than a user's VL.

I disagree with this implication. `largest VL among all the users` really refers to the largest getMinimumVLForUser`, which may be smaller than a users VL.

So in reality, these two statements are not conflicting.


https://github.com/llvm/llvm-project/pull/120345