[PATCH] D132477: Improve cost model for some 128-bit vector operations that use SVE

Mon Sep 5 08:03:52 PDT 2022

david-arm added a comment.

Thanks for adding all the cost calculations for the fixed-length tests @hassnaa-arm - it looks much better now that we're testing all the SVE vector lengths. I just had a few more comments and then I think it's ready to go!

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:2068
+        // consider the cost of legalizaiton
+        Cost *= LT.first;
+      } else {
----------------
I am so sorry about this @hassnaa-arm, but I made a mistake previously asking you to multiply by the legalisation cost. It seems that `BaseT::getArithmeticInstrCost` already takes this into account.

Also, I didn't spot this before, but I think we should actually return the cost here, i.e.

  return Cost;

because otherwise we fall through to code below that doubles the cost again, i.e.

  Cost += Cost;

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:2078
+
       // TODO: if one of the arguments is scalar, then it's not necessary to
       // double the cost of handling the vector elements.
----------------
I think it makes sense to restructure this so that this cost-doubling lives in the else case, i.e. something like:

  if (TLI->isOperationLegalOrCustom(ISD, LT.second) && ST->hasSVE()) {
    ...
    return Cost;
  } else {
    ...
    // TODO: if one of the arguments is scalar, then it's not necessary to
    // double the cost of handling the vector elements.
    return Cost * 2;
  }

================
Comment at: llvm/test/Analysis/CostModel/AArch64/sve-fixed-length.ll:61

+; Assuming base_cost = 2 * ((vec_len-1/VBITS)+1)
+; Assuming legalization_cost = (vec_len-1/VBITS)+1
----------------
I'm not sure this comment is quite right because essentially the base cost will be 2, 8, or 16 depending upon the element size. Then we multiply by the legalisation cost. Perhaps it might be clearer to say something like

; The base cost may vary depending upon the legalised element type, i.e.
; 2 for 32 and 64-bit elements, 8 for 16-bit elements and 16 for 8-bit elements.
; Then we multiply by a legalization cost, which is equal to (vec_len-1/VBITS)+1.
; Examples are given for expected costs when VBITS=128.

================
Comment at: llvm/test/Analysis/CostModel/AArch64/sve-fixed-length.ll:69
+; expected cost: 4
+; CHECK: cost of [[#mul(mul(mul((div(16-1, VBITS)+1), 2), (div(16-1, VBITS)+1)), 2)]] for instruction:  %sdiv16.i8   = sdiv <2 x i8> undef, undef
+  %sdiv16.i8   = sdiv <2 x i8> undef, undef
----------------
So for all cases where the vector length <= 128 we actually know the answer I think because VBITS is always >=128. For example, for all values of VBITS we know that (16-1)/VBITS=0. The same goes for the other cases up to and including 128-bit vectors, i.e. (128-1)/VBITS=0. So I think you can simplify the CHECK lines for all vectors <= 128 bits to using a single, static value, i.e.

Cost Model: Found an estimated cost of 2 for instruction:   %sdiv16.i8 = sdiv <2 x i8> undef, undef
Cost Model: Found an estimated cost of 8 for instruction:   %sdiv32.i8 = sdiv <4 x i8> undef, undef
...

================
Comment at: llvm/test/Analysis/CostModel/AArch64/sve-fixed-length.ll:105
+; expected cost: 512
+; CHECK: cost of [[#mul(mul(mul(mul((div(512-1, VBITS)+1), 2), (div(512-1, VBITS)+1)), 8), 2)]] for instruction:  %sdiv512.i8   = sdiv <64 x i8> undef, undef
+  %sdiv512.i8  = sdiv <64 x i8> undef, undef
----------------
With my comment about the incorrect legalisation cost above I think these CHECK lines can be simplified to just a single mul, i.e.

; CHECK: cost of [[#mul((div(512-1, VBITS)+1), 16]] for instruction:  %sdiv512.i8   = sdiv <64 x i8> undef, undef
; CHECK: cost of [[#mul((div(512-1, VBITS)+1), 8]] for instruction:  %sdiv512.i16   = sdiv <32 x i16> undef, undef
; CHECK: cost of [[#mul((div(512-1, VBITS)+1), 2]] for instruction:  %sdiv512.i32   = sdiv <16 x i32> undef, undef
; CHECK: cost of [[#mul((div(512-1, VBITS)+1), 2]] for instruction:  %sdiv512.i64   = sdiv <8 x i64> undef, undef

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132477/new/

https://reviews.llvm.org/D132477