[PATCH] D132477: Improve cost model for some 128-bit vector operations that use SVE

Thu Sep 22 06:11:16 PDT 2022

david-arm added a comment.

Hi @hassnaa-arm, thanks for updating the patch with more accurate costs for <128 bit vectors. It's almost there I think - just a few more comments and then I'll be happy!

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:2100
     // need to scalarize so the cost can be cheaper (smull or umull).
+    // so the cost can be cheaper (smull or umull).
     if (LT.second != MVT::v2i64 || isWideningInstruction(Ty, Opcode, Args))
----------------
david-arm wrote:
> nit: This line looks unnecessary.
This line still needs removing I think?

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:2088
+      if (TLI->isOperationLegalOrCustom(ISD, LT.second) && ST->hasSVE()) {
+        // SDIV/UDIV operations are lowered, then we can have less cost.
+        // For 8/16-bit elements, the cost is higher because the type
----------------
nit: Perhaps this is better written as something like

        // If SDIV/UDIV operations are lowered using SVE sdiv/udiv
        // instructions, then the cost is less because don't need to
        // scalarize the operation.

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:2089
+        // SDIV/UDIV operations are lowered, then we can have less cost.
+        // For 8/16-bit elements, the cost is higher because the type
+        // requires promotion and possibly splitting:
----------------
nit: I think this comment now needs moving down below to where we check the element type.

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:2091
+        // requires promotion and possibly splitting:
+        if (isa<FixedVectorType>(Ty)) {
+          EVT VT = TLI->getValueType(DL, Ty);
----------------
I think you can make some of the following code a bit simpler if you also check the type size here, i.e.

        if (isa<FixedVectorType>(Ty) && cast<FixedVectorType>(Ty)->getPrimitiveSizeInBits().getFixedSize() < 128) {

This means you can remove the `if (VT.getSizeInBits() < 128u)` below.

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:2101
+
+          if (VT.getSizeInBits() < 128u) {
+            const auto *Entry = CostTableLookup(DivTbl, ISD, VT.getSimpleVT());
----------------
Just FYI `getSizeInBits()` returns a `TypeSize` class (llvm/include/llvm/Support/TypeSize.h) that tries to represent both fixed-width and scalable sizes. The code you've written here works, but it's relying upon a cast from a TypeSize class to an unsigned integer that only works for fixed-width vectors. We'd like to avoid relying upon these casts and so for things like this we'd prefer to use `getFixedSizeInBits()` instead, which asserts the TypeSize is fixed.

================
Comment at: llvm/test/Analysis/CostModel/AArch64/sve-fixed-length.ll:63
+; Assuming legalization_cost = (vec_len-1/VBITS)+1
+; Assuming extra cost of 8 for i8.
+; Assuming extra cost of 4 for i16.
----------------
nit: I think it might be clearer to say:

  ; Assuming extra cost multipler of 8 for i8.
  ; Assuming extra cost multipler of 4 for i8.

================
Comment at: llvm/test/Analysis/CostModel/AArch64/sve-fixed-length.ll:93
+
+; expected cost: 32
+; CHECK: cost of [[#mul(mul(div(128-1, VBITS)+1, 2), 8)]] for instruction:  %sdiv128.i8   = sdiv <16 x i8> undef, undef
----------------
Looks like these `; expected cost` comments aren't quite right here. For example, for <16 x i8> it looks like the cost should 2*8=16. I think it's probably worth either:

  1. Removing all the `; expected cost` comments in the file because they may need updating over time, or
  2. Update them to reflect the correct costs for each operation.

I'm happy with either!

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132477/new/

https://reviews.llvm.org/D132477