[PATCH] D132477: Improve cost model for some 128-bit vector operations that use SVE

Tue Aug 30 08:27:58 PDT 2022

david-arm added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:2044
       // expanded into scalar divisions of each pair of elements.
-      Cost += getArithmeticInstrCost(Instruction::ExtractElement, Ty, CostKind,
-                                     Op1Info, Op2Info);
-      Cost += getArithmeticInstrCost(Instruction::InsertElement, Ty, CostKind,
-                                     Op1Info, Op2Info);
+      if (OverrideNEON || (isa<ScalableVectorType>(Ty))) {
+        bool IsFloat = Ty->isFPOrFPVectorTy();
----------------
sdesmalen wrote:
> hassnaa-arm wrote:
> > @sdesmalen 
> > I noticed something here,
> > is using that **isa** for Ty is correct, give that Ty is a ptr of "Type" ?
> > I thought about using Ty.isVectorTy(), but it returns true for both scalable and fixed vectors, so it doesn't work here.
> `class ScalableVectorType` subclasses from `class VectorType`, which in turn subclasses from `class Type`.
> `class FixedVectorType` also subclasses from `class VectorType`.
> 
> If `Ty` is a `Type*`, you can ask at runtime if `Ty` is actually an object of class `ScalableVectorType` using `isa<ScalableVectorType>(Ty)`.
> 
> If you ask `Ty->isVectorTy()` it will return true if it is either a `FixedVectorType` or a `ScalableVectorType`, which means that is synonymous with `isa<VectorType>(Ty)`.
> 
> So yes, if you want to check if a `Type *` is actually a `ScalableVectorType*`, then using `isa` seems correct.
I think you can probably just do something like:

  if (TLI->isOperationLegalOrCustom(ISD, VT)) {
    ... possibly increase the cost for 8-bit or 16-bit element vectors ...
    return Cost;
  } else {
    ... existing code ...
  }

This is because the operation will be marked as Expand if we don't have SVE.

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:2045
+      if (OverrideNEON || (isa<ScalableVectorType>(Ty))) {
+        bool IsFloat = Ty->isFPOrFPVectorTy();
+        // Assume that floating point arithmetic operations cost twice as much
----------------
The sdiv and udiv operations only operate on integer types so this isn't needed.

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:2049
+        InstructionCost OpCost = (IsFloat ? 2 : 1);
+        // multiply by 2 because it's calculated for both extract and insert
+        Cost += (LT.first * OpCost * 2);
----------------
hassnaa-arm wrote:
> sdesmalen wrote:
> > I wouldn't expect there to be any extract and insert, because the division is legal.
> I found that the code flow calculate the cost for both extract and insert, that's why I considered them.
Hi @hassnaa-arm, I think Sander is right. If we're going to use the SVE udiv/sdiv instructions then there won't be any inserts or extracts. These only happen for NEON because we are going to scalarise the instruction, which requires something like:

  Step 1: Extract lane 0 from vectors a and b
  Step 2: Multiply these scalar values together
  Step 3: Insert the scalar sdiv/udiv result back into a vector.

For SVE we don't need steps 1 and 3 because we perform the sdiv/udiv on the whole vector. If you run the tests you added through llc with -mattr=+sve you will only see a ptrue and a sdiv/udiv instruction, whereas for NEON (-mattr=+neon) you would also see some lane move instructions.

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:2064
   case ISD::MUL:
-    // Since we do not have a MUL.2d instruction, a mul <2 x i64> is expensive
+    // Since we do not have a MUL.2d instruction, a mul <4 x i64> is expensive
     // as elements are extracted from the vectors and the muls scalarized.
----------------
These comments are still valid if we don't have SVE so perhaps it's best to leave unchanged. What you could do is change the first line to say:

  // Without SVE we do not have a ...

and then at the end of the comments you could add something like:

  // However, if SVE is available then we can lower the v2i64 operation using the
  // SVE mul instruction, which has a lower cost.

================
Comment at: llvm/test/Analysis/CostModel/AArch64/sve-fixed-length-div-mul.ll:1
+; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
+; RUN: opt < %s -passes='print<cost-model>' 2>&1 -disable-output -mcpu=neoverse-v1  -mtriple=aarch64-linux-gnu -mattr=+sve | FileCheck %s
----------------
I think it might be worth moving the fixed-width vector tests into the existing file:

  test/Analysis/CostModel/AArch64/sve-fixed-length.ll

Then you could rename this file to just be something like

  test/Analysis/CostModel/AArch64/sve-arith.ll

================
Comment at: llvm/test/Analysis/CostModel/AArch64/sve-fixed-length-div-mul.ll:2
+; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
+; RUN: opt < %s -passes='print<cost-model>' 2>&1 -disable-output -mcpu=neoverse-v1  -mtriple=aarch64-linux-gnu -mattr=+sve | FileCheck %s
+
----------------
I think you can remove the `-mcpu=neoverse-v1` argument here because your changes apply for all targets.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132477/new/

https://reviews.llvm.org/D132477