[llvm] [SystemZ] Provide improved cost estimates (PR #83873)

Thu Mar 7 05:53:21 PST 2024

================
@@ -1284,17 +1286,43 @@ InstructionCost SystemZTTIImpl::getInterleavedMemoryOpCost(
   return NumVectorMemOps + NumPermutes;
 }
 
-static int getVectorIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy) {
+static int
+getVectorIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy,
+                            const SmallVectorImpl<Type *> &ParamTys) {
   if (RetTy->isVectorTy() && ID == Intrinsic::bswap)
     return getNumVectorRegs(RetTy); // VPERM
+
+  if (ID == Intrinsic::vector_reduce_add) {
+    // Retrieve number and size of elements for the vector op.
+    auto *VTy = cast<FixedVectorType>(ParamTys.front());
+    unsigned NumElements = VTy->getNumElements();
+    unsigned ScalarSize = VTy->getScalarSizeInBits();
+    // For scalar sizes >128 bits, we fall back to the generic cost estimate.
+    if (ScalarSize > SystemZ::VectorBits)
+      return -1;
+    // A single vector register can hold this many elements.
+    unsigned MaxElemsPerVector = SystemZ::VectorBits / ScalarSize;
+    // This many vector regs are needed to represent the input elements (V).
+    unsigned VectorRegsNeeded = getNumVectorRegs(VTy);
+    // This many instructions are needed for the final sum of vector elems (S).
+    unsigned LastVectorHandling =
+        2 * Log2_32_Ceil(std::min(NumElements, MaxElemsPerVector));
+    // We use vector adds to create a sum vector, which takes
+    // V/2 + V/4 + ... = V - 1 operations.
+    // Then, we need S operations to sum up the elements of that sum vector,
+    // for a total of V + S - 1 operations.
+    int Cost = VectorRegsNeeded + LastVectorHandling - 1;
+    assert(Cost > 0 && "Predicted cost of vector.reduce.add must be > 0");
----------------
dominik-steenken wrote:

I would have expected a `vector.reduce.add.v1i128` to essentially be a noop and to be caught before we get to this point. A quick test seems to support this assumption where a debug build produced the following code for a `vector.reduce.add.v1i128`:
```asm
        vl      %v0, 0(%r2), 3
        vst     %v0, 0(%r3)
```
without running into the assert mentioned above. However, i'm not sure how representative that is and i don't think i can say with any certainty that it couldn't happen. Essentially, since this formula calculates the number of operations needed, it faithfully, and correctly (see above), returns `0` for that case, which makes it a bad idea to have an `assert` that applies there. Point taken.

The original motivation behind the `assert` was the concern that some other change might in the future affect the execution of this function in such a way that the computed cost would be `-1`, and then the compiler would silently fall back to the generic cost computation. That, combined with the coding guideline saying that i should use asserts to check assumptions (even then, the assumption to be checked ought to have been `Cost >= 0`).

But, the tests that were added would also catch an unexpected fallback like that, so i think the best thing o do would be to just remove the `assert` entirely?

https://github.com/llvm/llvm-project/pull/83873