[PATCH] D109368: [LV] Don't vectorize if we can prove RT + vector cost >= scalar cost.
Roman Lebedev via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sat Sep 25 15:32:04 PDT 2021
lebedev.ri accepted this revision.
lebedev.ri added a comment.
This revision is now accepted and ready to land.
I have checked, and this is even more intrusive solution than the originally-disscussed design (D109296 <https://reviews.llvm.org/D109296>),
and it very successfully vectorizes the loops in question!
I must say, i **really** like this approach, since it is the only way to truly cover all the cases.
I basically had in mind something along the lines of "do we even need restrict the run-time checks",
i did not propose this as i thought it would be *too* radical.
I've gone over the math (both by hand and via sage, and it checks out)
As far as i'm concerned, this Looks Great.
@fhahn @Ayal thank you so much!
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h:190
+ ElementCount MinProfTripCount;
+
----------------
Nit: what does `prof` mean? PGO profile? Profitable?
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3322-3327
+ if (UF * VF.getKnownMinValue() < MinProfTripCount.getKnownMinValue())
+ Step = createStepForVF(Builder, ConstantInt::get(Count->getType(), 1),
+ MinProfTripCount);
+ else
+ Step =
+ createStepForVF(Builder, ConstantInt::get(Count->getType(), UF), VF);
----------------
This looks like `std::max(step, MinProfTripCount)`?
Might be worth adding a comment.
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8237
+ // total scalar cost:
+ // RtC + VecC (TC / VF) + EpiC < ScalarC * TC
+ //
----------------
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8239-8240
+ //
+ // Now we can compute the minimum required trip count TC as
+ // (RtC + EpiC) / (ScalarC + (VecC / VF)) < TC
+ //
----------------
I'm rather sure we can't :)
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8242-8247
+ // For now we assume the epilogue cost EpiC = 0 for simplicity.
+ unsigned VF = SelectedVF.Width.getKnownMinValue();
+ double ScalarC = *SelectedVF.ScalarCost.getValue();
+ double VecC = double(*SelectedVF.Cost.getValue()) / VF;
+ double RtC = *Checks.getCost(CM).getValue();
+ double MinTC1 = RtC / (ScalarC - VecC);
----------------
I agree that here the math checks out, the sign is correct.
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8245
+ double ScalarC = *SelectedVF.ScalarCost.getValue();
+ double VecC = double(*SelectedVF.Cost.getValue()) / VF;
+ double RtC = *Checks.getCost(CM).getValue();
----------------
This is not `VecC`, it's `VecC/VF`.
Not an error, but confusing.
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8257-8261
+ // Now pick the larger minimum. If it is not a multiple of VF, choose the
+ // next closest multiple of VF. This should partly compensate for ignoring
+ // the epilogue cost.
+ uint64_t MinTC = std::ceil(std::max(MinTC1, MinTC2));
+ uint64_t MinTCDown = MinTC & ~(VF - 1);
----------------
So does this round up or down?
Use `alignTo()`/`alignDown()`?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D109368/new/
https://reviews.llvm.org/D109368
More information about the llvm-commits
mailing list