[llvm] [RISCV][TTI] Model partial reduce of ext for zvqdotq (PR #146788)

Mon Jul 14 10:28:48 PDT 2025

================
@@ -303,16 +303,29 @@ InstructionCost RISCVTTIImpl::getPartialReductionCost(
   // zve32x is broken for partial_reduce_umla, but let's make sure we
   // don't generate them.
   if (!ST->hasStdExtZvqdotq() || ST->getELen() < 64 ||
-      Opcode != Instruction::Add || !BinOp || *BinOp != Instruction::Mul ||
-      InputTypeA != InputTypeB || !InputTypeA->isIntegerTy(8) ||
+      Opcode != Instruction::Add || !InputTypeA->isIntegerTy(8) ||
       !AccumType->isIntegerTy(32) || !VF.isKnownMultipleOf(4))
     return InstructionCost::getInvalid();
 
+  // We support both the plain dot product idiom, and the use of dotproduct
+  // to compute a a reduction of an extended value.
+  if (BinOp && (*BinOp != Instruction::Mul || InputTypeA != InputTypeB))
+    return InstructionCost::getInvalid();
+
+  InstructionCost IntMatCost = 0;
+  if (!BinOp) {
+    // Cost to produce one vmv.v.i -- since the constant is shared across any
+    // unrolled copies, don't need to scale by LT.first.
+    Type *Tp = VectorType::get(InputTypeA, VF);
+    std::pair<InstructionCost, MVT> LT = getTypeLegalizationCost(Tp);
+    IntMatCost = getRISCVInstructionCost(RISCV::VMV_V_I, LT.second, CostKind);
----------------
preames wrote:

The current lowering will expand the constant using an SEW=e8 vmv.v.x, and that can't be folded into the .vx form of the instruction.  That materialization will probably be hoisted out for loops with low register pressure, but won't for loops with high register pressure.  (Or more accurately, it might be sunk back in.)  I went with the more conservative costing for the moment; I think we can revisit this later if we find the lower cost would actually influence the profitable choice.  (At least so far, it doesn't seem to.)

https://github.com/llvm/llvm-project/pull/146788