[PATCH] D109368: [LV] Don't vectorize if we can prove RT + vector cost >= scalar cost.

Thu May 26 02:19:33 PDT 2022

dmgreen added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2070
+        RTCheckCost +=
+            CM.getInstructionCost(&I, ElementCount::getFixed(1)).first;
+      }
----------------
Would it be better to call CM.getInstructionCost here, or the base TTI.getInstructionCost?
It appears that the cost of a vscale is coming through as 10, where it should be 1. I'm not sure if any of the other processing done in CM.getInstructionCost is very useful for the scalar runtime checks?
I've added some quick tests (rG75631438e333) to show that the cost should be 1, it is just not treated as a "vectorizable intrinsic" by CM, so given the generic cost of a call.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2077
+        RTCheckCost +=
+            CM.getInstructionCost(&I, ElementCount::getFixed(1)).first;
+      }
----------------
On a related note, can we get this to print the costs of each of the instructions in the runtime checks? It is useful for debugging when the numbers are incorrect. Otherwise at the moment I believe it just prints the final MinProfitableTripCount without any explanation of how it got there.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:10369
+  if (VF.Width.isScalar()) {
+    if (CheckCost > 100) {
+      LLVM_DEBUG(
----------------
Is it worth making "100" a compiler option, so that it is not hardcoded? Could it reuse VectorizeMemoryCheckThreshold, even if it is a different unit?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109368/new/

https://reviews.llvm.org/D109368