[PATCH] D132477: Improve cost model for some 128-bit vector operations that use SVE
David Sherwood via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Aug 30 08:27:58 PDT 2022
david-arm added inline comments.
================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:2044
// expanded into scalar divisions of each pair of elements.
- Cost += getArithmeticInstrCost(Instruction::ExtractElement, Ty, CostKind,
- Op1Info, Op2Info);
- Cost += getArithmeticInstrCost(Instruction::InsertElement, Ty, CostKind,
- Op1Info, Op2Info);
+ if (OverrideNEON || (isa<ScalableVectorType>(Ty))) {
+ bool IsFloat = Ty->isFPOrFPVectorTy();
----------------
sdesmalen wrote:
> hassnaa-arm wrote:
> > @sdesmalen
> > I noticed something here,
> > is using that **isa** for Ty is correct, give that Ty is a ptr of "Type" ?
> > I thought about using Ty.isVectorTy(), but it returns true for both scalable and fixed vectors, so it doesn't work here.
> `class ScalableVectorType` subclasses from `class VectorType`, which in turn subclasses from `class Type`.
> `class FixedVectorType` also subclasses from `class VectorType`.
>
> If `Ty` is a `Type*`, you can ask at runtime if `Ty` is actually an object of class `ScalableVectorType` using `isa<ScalableVectorType>(Ty)`.
>
> If you ask `Ty->isVectorTy()` it will return true if it is either a `FixedVectorType` or a `ScalableVectorType`, which means that is synonymous with `isa<VectorType>(Ty)`.
>
> So yes, if you want to check if a `Type *` is actually a `ScalableVectorType*`, then using `isa` seems correct.
I think you can probably just do something like:
if (TLI->isOperationLegalOrCustom(ISD, VT)) {
... possibly increase the cost for 8-bit or 16-bit element vectors ...
return Cost;
} else {
... existing code ...
}
This is because the operation will be marked as Expand if we don't have SVE.
================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:2045
+ if (OverrideNEON || (isa<ScalableVectorType>(Ty))) {
+ bool IsFloat = Ty->isFPOrFPVectorTy();
+ // Assume that floating point arithmetic operations cost twice as much
----------------
The sdiv and udiv operations only operate on integer types so this isn't needed.
================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:2049
+ InstructionCost OpCost = (IsFloat ? 2 : 1);
+ // multiply by 2 because it's calculated for both extract and insert
+ Cost += (LT.first * OpCost * 2);
----------------
hassnaa-arm wrote:
> sdesmalen wrote:
> > I wouldn't expect there to be any extract and insert, because the division is legal.
> I found that the code flow calculate the cost for both extract and insert, that's why I considered them.
Hi @hassnaa-arm, I think Sander is right. If we're going to use the SVE udiv/sdiv instructions then there won't be any inserts or extracts. These only happen for NEON because we are going to scalarise the instruction, which requires something like:
Step 1: Extract lane 0 from vectors a and b
Step 2: Multiply these scalar values together
Step 3: Insert the scalar sdiv/udiv result back into a vector.
For SVE we don't need steps 1 and 3 because we perform the sdiv/udiv on the whole vector. If you run the tests you added through llc with -mattr=+sve you will only see a ptrue and a sdiv/udiv instruction, whereas for NEON (-mattr=+neon) you would also see some lane move instructions.
================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:2064
case ISD::MUL:
- // Since we do not have a MUL.2d instruction, a mul <2 x i64> is expensive
+ // Since we do not have a MUL.2d instruction, a mul <4 x i64> is expensive
// as elements are extracted from the vectors and the muls scalarized.
----------------
These comments are still valid if we don't have SVE so perhaps it's best to leave unchanged. What you could do is change the first line to say:
// Without SVE we do not have a ...
and then at the end of the comments you could add something like:
// However, if SVE is available then we can lower the v2i64 operation using the
// SVE mul instruction, which has a lower cost.
================
Comment at: llvm/test/Analysis/CostModel/AArch64/sve-fixed-length-div-mul.ll:1
+; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
+; RUN: opt < %s -passes='print<cost-model>' 2>&1 -disable-output -mcpu=neoverse-v1 -mtriple=aarch64-linux-gnu -mattr=+sve | FileCheck %s
----------------
I think it might be worth moving the fixed-width vector tests into the existing file:
test/Analysis/CostModel/AArch64/sve-fixed-length.ll
Then you could rename this file to just be something like
test/Analysis/CostModel/AArch64/sve-arith.ll
================
Comment at: llvm/test/Analysis/CostModel/AArch64/sve-fixed-length-div-mul.ll:2
+; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
+; RUN: opt < %s -passes='print<cost-model>' 2>&1 -disable-output -mcpu=neoverse-v1 -mtriple=aarch64-linux-gnu -mattr=+sve | FileCheck %s
+
----------------
I think you can remove the `-mcpu=neoverse-v1` argument here because your changes apply for all targets.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D132477/new/
https://reviews.llvm.org/D132477
More information about the llvm-commits
mailing list