[PATCH] D109368: [LV] Don't vectorize if we can prove RT + vector cost >= scalar cost.

Sat Sep 25 15:32:04 PDT 2021

lebedev.ri accepted this revision.
lebedev.ri added a comment.
This revision is now accepted and ready to land.

I have checked, and this is even more intrusive solution than the originally-disscussed design (D109296 <https://reviews.llvm.org/D109296>),
and it very successfully vectorizes the loops in question!

I must say, i **really** like this approach, since it is the only way to truly cover all the cases.
I basically had in mind something along the lines of "do we even need restrict the run-time checks",
i did not propose this as i thought it would be *too* radical.

I've gone over the math (both by hand and via sage, and it checks out)
As far as i'm concerned, this Looks Great.

@fhahn @Ayal thank you so much!

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h:190

+  ElementCount MinProfTripCount;
+
----------------
Nit: what does `prof` mean? PGO profile? Profitable?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3322-3327
+    if (UF * VF.getKnownMinValue() < MinProfTripCount.getKnownMinValue())
+      Step = createStepForVF(Builder, ConstantInt::get(Count->getType(), 1),
+                             MinProfTripCount);
+    else
+      Step =
+          createStepForVF(Builder, ConstantInt::get(Count->getType(), UF), VF);
----------------
This looks like `std::max(step, MinProfTripCount)`?
Might be worth adding a comment.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8237
+    // total scalar cost:
+    //   RtC + VecC (TC / VF) + EpiC <  ScalarC * TC
+    //
----------------

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8239-8240
+    //
+    // Now we can compute the minimum required trip count TC as
+    //   (RtC + EpiC) / (ScalarC + (VecC / VF)) < TC
+    //
----------------
I'm rather sure we can't :)

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8242-8247
+    // For now we assume the epilogue cost EpiC = 0 for simplicity.
+    unsigned VF = SelectedVF.Width.getKnownMinValue();
+    double ScalarC = *SelectedVF.ScalarCost.getValue();
+    double VecC = double(*SelectedVF.Cost.getValue()) / VF;
+    double RtC = *Checks.getCost(CM).getValue();
+    double MinTC1 = RtC / (ScalarC - VecC);
----------------
I agree that here the math checks out, the sign is correct.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8245
+    double ScalarC = *SelectedVF.ScalarCost.getValue();
+    double VecC = double(*SelectedVF.Cost.getValue()) / VF;
+    double RtC = *Checks.getCost(CM).getValue();
----------------
This is not `VecC`, it's `VecC/VF`.
Not an error, but confusing.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8257-8261
+    // Now pick the larger minimum. If it is not a multiple of VF, choose the
+    // next closest multiple of VF. This should partly compensate for ignoring
+    // the epilogue cost.
+    uint64_t MinTC = std::ceil(std::max(MinTC1, MinTC2));
+    uint64_t MinTCDown = MinTC & ~(VF - 1);
----------------
So does this round up or down?
Use `alignTo()`/`alignDown()`?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109368/new/

https://reviews.llvm.org/D109368