[PATCH] D109368: [LV] Don't vectorize if we can prove RT + vector cost >= scalar cost.

Mon Sep 27 00:10:24 PDT 2021

dmgreen added a comment.

This sounds like a sensible general approach to take to me, from a cost point of view. It seems more like how I once heard the GCC vectorizer cost calculate described.

But there may be more work required on the costing calculations. It might be both under and over estimating the costs in places.
The first obvious problem I ran into was that the runtime checks it emits are not simplified prior to the costs being added for each instruction. The overflow checks seem to create "umul_with_overflow(|step|, TC). But the step is commonly 1 so given half a chance the instruction will be simplified away.
It looks like there were other inefficiencies with the values created too, with `select false, x, y` or `or x, false` being added to the cost. But the umul_with_overflow(1, TC) was getting a pretty high cost.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:449
+                      ProfileSummaryInfo *PSI, GeneratedRTChecks &RTChecks,
+                      ElementCount MinProfTripCount)
+      : OrigLoop(OrigLoop), PSE(PSE), LI(LI), DT(DT), TLI(TLI), TTI(TTI),
----------------
Can this have a default value, to prevent the need for multiple constructors?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8226
+    //  The total cost of the vector loop is
+    //    RtC + VecC * (TC / VF) + EpiC
+    //  where
----------------
This should be `RtC + VecC * floor(TC / VF) + EpiC` or `RtC + VecC * ceil(TC / VF)` when folding the tail (assuming there is no epilogue then). That makes the math more difficult unless it makes the assumption that those round to the same thing.

It is also assuming that the runtime checks do not fail, otherwise a probability factor would need to be added. (Probably fine to assume, but the more runtime checks there are, the less likely they are to all succeed).

And there can be other costs for vectorizations. The "check N is less than MinProfTripCount" isn't free and there can be other inefficiencies from vector loops.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8242
+    //
+    // For now we assume the epilogue cost EpiC = 0 for simplicity.
+    unsigned VF = SelectedVF.Width.getKnownMinValue();
----------------
lebedev.ri wrote:
> I agree that here the math checks out, the sign is correct.
Perhaps add some more assumptions, like the ones mentioned above and that (ScalarC - (VecC / VF)) > 0 from other profitability checks.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109368/new/

https://reviews.llvm.org/D109368