[PATCH] D109368: [LV] Don't vectorize if we can prove RT + vector cost >= scalar cost.

Fri May 27 06:34:13 PDT 2022

fhahn marked 2 inline comments as done.
fhahn added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2070
+        RTCheckCost +=
+            CM.getInstructionCost(&I, ElementCount::getFixed(1)).first;
+      }
----------------
dmgreen wrote:
> Would it be better to call CM.getInstructionCost here, or the base TTI.getInstructionCost?
> It appears that the cost of a vscale is coming through as 10, where it should be 1. I'm not sure if any of the other processing done in CM.getInstructionCost is very useful for the scalar runtime checks?
> I've added some quick tests (rG75631438e333) to show that the cost should be 1, it is just not treated as a "vectorizable intrinsic" by CM, so given the generic cost of a call.
> Would it be better to call CM.getInstructionCost here, or the base TTI.getInstructionCost?

Good point, as we are only interested in the scalar cost it should be sufficient to use TTI. Updated, thanks!

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2077
+        RTCheckCost +=
+            CM.getInstructionCost(&I, ElementCount::getFixed(1)).first;
+      }
----------------
dmgreen wrote:
> On a related note, can we get this to print the costs of each of the instructions in the runtime checks? It is useful for debugging when the numbers are incorrect. Otherwise at the moment I believe it just prints the final MinProfitableTripCount without any explanation of how it got there.
I update dthe code to print instruction costs here. An interesting result is that it seems our cost estimates for GEPs currently always assume it is used in instructions where constant can be folded via the addressing mode.  This is not true when using the pointers in compares as we do here. So the cost is slightly underestimated.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:10369
+  if (VF.Width.isScalar()) {
+    if (CheckCost > 100) {
+      LLVM_DEBUG(
----------------
dmgreen wrote:
> Is it worth making "100" a compiler option, so that it is not hardcoded? Could it reuse VectorizeMemoryCheckThreshold, even if it is a different unit?
Good point, I updated the code to re-use the option.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109368/new/

https://reviews.llvm.org/D109368