[PATCH] D132477: Improve cost model for some 128-bit vector operations that use SVE

Mon Sep 12 03:53:17 PDT 2022

sdesmalen added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:2039-2042
+        // SDIV/UDIV operations are lowered, then we can have less cost.
+        // for 32/64-bit elements, just the base cost is enough.
+        // but for 8/16-bit elements, there will be higher cost becaues they
+        // require extra work :
----------------
sdesmalen wrote:
> nit: How about:
> 
>   // For 8/16-bit element types, the cost is higher because the type requires
>   // promotion and possibly splitting.
Just FYI that this comment seems unaddressed.

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:2050-2051
+      } else {
+        // On AArch64, vector divisions are not supported natively and are
+        // expanded into scalar divisions of each pair of elements.
+        Cost += getArithmeticInstrCost(Instruction::ExtractElement, Ty,
----------------
sdesmalen wrote:
> nit: this comment needs updating now.
Just FYI that this comment is unaddressed.

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:2055
     InstructionCost Cost = BaseT::getArithmeticInstrCost(
         Opcode, Ty, CostKind, Op1Info, Op2Info);
     if (Ty->isVectorTy()) {
----------------
For the case of using SVE, this probably shouldn't be looking at the original type, but rather the type that is legal for the *operation*.

There are two separate things going on here:
1. The *type* legalization cost, which is the cost of getting a legal type.
2. The *operation* legalization cost, which currently assumes a legal type. As your code already points out, operations are only legal on 32- and 64-bit element types.

The way the code-generator handles this is that for 8 and 16-bit inputs, the *type* legalizer first tries to legalize them to something the code-generator can handle. After that, the code-generator will try to promote it to a type that the *operation* can handle.

e.g.
* if the type is <2 x i8> that will be promoted to <2 x i32>. This is all fine, we can use the cost of LT.first (The cost of a DIV for v2i32) and combine it with the legalisation cost for going from v2i8 -> v2i32.
* if the type is <4 x i16> that is a legal type, but for the DIV operation v4i16 is _not_ legal, so we'll need to consider legalisation for v4i32.

So for i8 and i16 element types, it should call `getTypeLegalizationCost();` again for the type that we know it will be promoted to. That type is a vector with the same number of lanes, but i32 element types.

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:2066
+          Cost *= 4;
+        // consider the cost of legalizaiton
+        return Cost;
----------------
nit: redundant comment, please remove.

================
Comment at: llvm/test/Analysis/CostModel/AArch64/sve-fixed-length.ll:69-75
+; expected cost: 4
+; CHECK: cost of [[#mul(div(16-1, VBITS)+1, 2)]] for instruction:  %sdiv16.i8   = sdiv <2 x i8> undef, undef
+  %sdiv16.i8   = sdiv <2 x i8> undef, undef
+
+; expected cost: 16
+; CHECK: cost of [[#mul(mul(div(32-1, VBITS)+1, 2), 4)]] for instruction:  %sdiv32.i8   = sdiv <4 x i8> undef, undef
+  %sdiv32.i8   = sdiv <4 x i8> undef, undef
----------------
In practice the cost of `sdiv <4 x i8>` is not 4 times as high as `sdiv <2 x i8>`, see e.g. https://godbolt.org/z/jM3M8rWsf so I think the costs are still not entirely right. Also see my other comment on the code itself.

================
Comment at: llvm/test/Analysis/CostModel/AArch64/sve-fixed-length.ll:70
+; expected cost: 4
+; CHECK: cost of [[#mul(div(16-1, VBITS)+1, 2)]] for instruction:  %sdiv16.i8   = sdiv <2 x i8> undef, undef
+  %sdiv16.i8   = sdiv <2 x i8> undef, undef
----------------
nit: 16-1 => 15 ?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132477/new/

https://reviews.llvm.org/D132477