[llvm] [SystemZ] Provide improved cost estimates (PR #83873)
Dominik Steenken via llvm-commits
llvm-commits at lists.llvm.org
Thu Mar 7 05:53:21 PST 2024
================
@@ -1284,17 +1286,43 @@ InstructionCost SystemZTTIImpl::getInterleavedMemoryOpCost(
return NumVectorMemOps + NumPermutes;
}
-static int getVectorIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy) {
+static int
+getVectorIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy,
+ const SmallVectorImpl<Type *> &ParamTys) {
if (RetTy->isVectorTy() && ID == Intrinsic::bswap)
return getNumVectorRegs(RetTy); // VPERM
+
+ if (ID == Intrinsic::vector_reduce_add) {
+ // Retrieve number and size of elements for the vector op.
+ auto *VTy = cast<FixedVectorType>(ParamTys.front());
+ unsigned NumElements = VTy->getNumElements();
+ unsigned ScalarSize = VTy->getScalarSizeInBits();
+ // For scalar sizes >128 bits, we fall back to the generic cost estimate.
+ if (ScalarSize > SystemZ::VectorBits)
+ return -1;
+ // A single vector register can hold this many elements.
+ unsigned MaxElemsPerVector = SystemZ::VectorBits / ScalarSize;
+ // This many vector regs are needed to represent the input elements (V).
+ unsigned VectorRegsNeeded = getNumVectorRegs(VTy);
+ // This many instructions are needed for the final sum of vector elems (S).
+ unsigned LastVectorHandling =
+ 2 * Log2_32_Ceil(std::min(NumElements, MaxElemsPerVector));
+ // We use vector adds to create a sum vector, which takes
+ // V/2 + V/4 + ... = V - 1 operations.
+ // Then, we need S operations to sum up the elements of that sum vector,
+ // for a total of V + S - 1 operations.
+ int Cost = VectorRegsNeeded + LastVectorHandling - 1;
+ assert(Cost > 0 && "Predicted cost of vector.reduce.add must be > 0");
----------------
dominik-steenken wrote:
I would have expected a `vector.reduce.add.v1i128` to essentially be a noop and to be caught before we get to this point. A quick test seems to support this assumption where a debug build produced the following code for a `vector.reduce.add.v1i128`:
```asm
vl %v0, 0(%r2), 3
vst %v0, 0(%r3)
```
without running into the assert mentioned above. However, i'm not sure how representative that is and i don't think i can say with any certainty that it couldn't happen. Essentially, since this formula calculates the number of operations needed, it faithfully, and correctly (see above), returns `0` for that case, which makes it a bad idea to have an `assert` that applies there. Point taken.
The original motivation behind the `assert` was the concern that some other change might in the future affect the execution of this function in such a way that the computed cost would be `-1`, and then the compiler would silently fall back to the generic cost computation. That, combined with the coding guideline saying that i should use asserts to check assumptions (even then, the assumption to be checked ought to have been `Cost >= 0`).
But, the tests that were added would also catch an unexpected fallback like that, so i think the best thing o do would be to just remove the `assert` entirely?
https://github.com/llvm/llvm-project/pull/83873
More information about the llvm-commits
mailing list