[llvm] [SystemZ] Provide improved cost estimates (PR #83873)

Wed Mar 6 07:27:28 PST 2024

================
@@ -1284,17 +1286,43 @@ InstructionCost SystemZTTIImpl::getInterleavedMemoryOpCost(
   return NumVectorMemOps + NumPermutes;
 }
 
-static int getVectorIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy) {
+static int
+getVectorIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy,
+                            const SmallVectorImpl<Type *> &ParamTys) {
   if (RetTy->isVectorTy() && ID == Intrinsic::bswap)
     return getNumVectorRegs(RetTy); // VPERM
+
+  if (ID == Intrinsic::vector_reduce_add) {
+    // Retrieve number and size of elements for the vector op.
+    auto *VTy = cast<FixedVectorType>(ParamTys.front());
+    unsigned NumElements = VTy->getNumElements();
+    unsigned ScalarSize = VTy->getScalarSizeInBits();
+    // For scalar sizes >128 bits, we fall back to the generic cost estimate.
+    if (ScalarSize > SystemZ::VectorBits)
+      return -1;
+    // A single vector register can hold this many elements.
+    unsigned MaxElemsPerVector = SystemZ::VectorBits / ScalarSize;
+    // This many vector regs are needed to represent the input elements (V).
+    unsigned VectorRegsNeeded = getNumVectorRegs(VTy);
+    // This many instructions are needed for the final sum of vector elems (S).
+    unsigned LastVectorHandling =
+        2 * Log2_32_Ceil(std::min(NumElements, MaxElemsPerVector));
+    // We use vector adds to create a sum vector, which takes
+    // V/2 + V/4 + ... = V - 1 operations.
+    // Then, we need S operations to sum up the elements of that sum vector,
+    // for a total of V + S - 1 operations.
+    int Cost = VectorRegsNeeded + LastVectorHandling - 1;
+    assert(Cost > 0 && "Predicted cost of vector.reduce.add must be > 0");
----------------
uweigand wrote:

Can we be certain this assert never triggers? I guess VectorRegsNeeded is at least 1, but why is LastVectorHandling also at least 1?  This seems to imply both NumElements and MaxElemsPerVector are at least 2.  This would not be true for v1i128 - are we sure this routine gets never used for that type?

A failing assertion is an internal compiler error in production, so we should try to avoid whereever possible.  In code like this, where we have some alternative (like returning -1 to fall back to generic code), we should probably use that rather than an assert if we run into any unexpected input.

https://github.com/llvm/llvm-project/pull/83873