[PATCH] D132477: Improve cost model for some 128-bit vector operations that use SVE

Tue Sep 6 05:26:15 PDT 2022

sdesmalen added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:2095-2099
+    // so the cost can be cheaper (smull or umull).
+    // However, if SVE is available then we can lower the v2i64 operation using
+    // the SVE mul instruction, which has a lower cost.
+    if (LT.second == MVT::v2i64 && (ST->hasSVE()))
+      return LT.first;
----------------
The comment on lines 2085 to 2094 applies to the code below.

Please move the code you've added here above the block of comments above, i.e. just below `case ISD::MUL:`

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:2036
         Opcode, Ty, CostKind, Op1Info, Op2Info);
+
     if (Ty->isVectorTy()) {
----------------
nit: unrelated whitespace change?

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:2039-2042
+        // SDIV/UDIV operations are lowered, then we can have less cost.
+        // for 32/64-bit elements, just the base cost is enough.
+        // but for 8/16-bit elements, there will be higher cost becaues they
+        // require extra work :
----------------
nit: How about:

  // For 8/16-bit element types, the cost is higher because the type requires
  // promotion and possibly splitting.

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:2050-2051
+      } else {
+        // On AArch64, vector divisions are not supported natively and are
+        // expanded into scalar divisions of each pair of elements.
+        Cost += getArithmeticInstrCost(Instruction::ExtractElement, Ty,
----------------
nit: this comment needs updating now.

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:2065
   case ISD::MUL:
-    // Since we do not have a MUL.2d instruction, a mul <2 x i64> is expensive
-    // as elements are extracted from the vectors and the muls scalarized.
-    // As getScalarizationOverhead is a bit too pessimistic, we estimate the
-    // cost for a i64 vector directly here, which is:
+    // without SVE we do not have a MUL.2d instruction, a mul <2 x i64> is
+    // expensive as elements are extracted from the vectors and the muls
----------------
nit: Please use proper capitalisation and punctuation in your comments.

How about:

  // When SVE is not available there is no MUL.2d instruction, which means
  // a mul <2 x i64> is expensive as elements are extracted from the vectors
  // and the muls are scalarized.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132477/new/

https://reviews.llvm.org/D132477