[PATCH] D135991: [AArch64] Fix cost model for `udiv` instruction when one of the operands is a uniform constant
Zain Jaffal via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Oct 19 06:39:18 PDT 2022
zjaffal added inline comments.
================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:2059
if (Ty->isVectorTy()) {
+ // if one of the operands is a uniform constant then the cost for each
+ // element is Cost for insertion, extraction and division.
----------------
david-arm wrote:
> I think this is broken because you're assuming that all vectors are of type FixedVectorType. This code will crash if you one of the operands is a constant and we're dealing with scalable vector types. Perhaps it's better to move this to the else case below, i.e.
>
> if (TLI->isOperationLegalOrCustom(ISD, LT.second) && ST->hasSVE()) {
> ...
> } else {
> if ((Op1Info.isConstant() && Op1Info.isUniform()) ||
> ...
> // On AArch64, without SVE, vector divisions are expanded
>
> We shouldn't be dealing with scalable vectors at this point so the `cast<FixedVectorType>(Ty)` ought to be fine.
I will move it after that block then
================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:2095
// into scalar divisions of each pair of elements.
Cost += getArithmeticInstrCost(Instruction::ExtractElement, Ty,
CostKind, Op1Info, Op2Info);
----------------
RKSimon wrote:
> dmgreen wrote:
> > Using getArithmeticInstrCost here seems very odd.
> >
> > I would expect the base cost to be `N*getVectorInstrCost(Instruction::ExtractElement) + N* getVectorInstrCost(Instruction::InsertElement) + N*getArithmeticInstrCost(Div)`. Sometimes we use a cheaper cross-register-bank copy cost than getVectorInstrCost would give.
> >
> > I would expect the getVectorInstrCost's to become removed if one of the operations was uniform or constant. And maybe the Cost += Cost shouldn't be needed if the vector instructions have correct costs. All the code here seems to be trying to give a very high cost for vector divides, as they will just be scalarized.
> You should be able to use getScalarizationOverhead to handle all of the extract/insert costs for you.
I think this probably needs fixing because the assumed result is over a 100 for the cases that I was dealing with if you look at the test cases down below. I think there is repeated counting
because first the base cost is
```
InstructionCost Cost = BaseT::getArithmeticInstrCost(
Opcode, Ty, CostKind, Op1Info, Op2Info);
```
And in the BaseT it calculates the extract and insert cost. This is the relevant snippet from `llvm/include/llvm/CodeGen/BasicTTIImpl.h`
```
if (auto *VTy = dyn_cast<FixedVectorType>(Ty)) {
InstructionCost Cost = thisT()->getArithmeticInstrCost(
Opcode, VTy->getScalarType(), CostKind, Opd1Info, Opd2Info,
Args, CxtI);
// Return the cost of multiple scalar invocation plus the cost of
// inserting and extracting the values.
SmallVector<Type *> Tys(Args.size(), Ty);
return getScalarizationOverhead(VTy, Args, Tys) +
VTy->getNumElements() * Cost;
}
```
================
Comment at: llvm/test/Analysis/CostModel/AArch64/div.ll:181
; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %I64 = sdiv i64 undef, 7
-; CHECK-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %V2i64 = sdiv <2 x i64> undef, <i64 7, i64 7>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 96 for instruction: %V4i64 = sdiv <4 x i64> undef, <i64 7, i64 7, i64 7, i64 7>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 192 for instruction: %V8i64 = sdiv <8 x i64> undef, <i64 7, i64 7, i64 7, i64 7, i64 7, i64 7, i64 7, i64 7>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V2i64 = sdiv <2 x i64> undef, <i64 7, i64 7>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V4i64 = sdiv <4 x i64> undef, <i64 7, i64 7, i64 7, i64 7>
----------------
david-arm wrote:
> dmgreen wrote:
> > It seems odd that doing 2 divides, plus some cross-register-bank copies, would be cheaper than doing 1
> I agree with @dmgreen, this does look odd.
I assumed that the insert and extract cost is 1. This is because `getVectorInstrCost(Instruction::ExtractElement)` and `getVectorInstrCost(Instruction::InsertElement)` is set to 2. I am not sure if I should use those values or keep it as it is.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D135991/new/
https://reviews.llvm.org/D135991
More information about the llvm-commits
mailing list