[PATCH] D109296: [LV] Improve inclusivity of vectorization

Roman Lebedev via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sun Sep 5 12:13:55 PDT 2021


lebedev.ri created this revision.
lebedev.ri added reviewers: fhahn, ebrevnov, anemet, asbirlea, MaskRay, aeubanks, wenlei, davidxl, nikic, dmgreen.
lebedev.ri added a project: LLVM.
Herald added subscribers: ormris, bmahjour, steven_wu, hiraditya.
lebedev.ri requested review of this revision.

Right now, LoopVectorizer has a hard limit on the number of runtime memory checks.
The limit is currently at `8`, and while it generally works reasonably well,
as with all arbitrary limits, it's an arbitrary limit.

There are several problems with it:

1. It puts a hard cap on the complexity of the loop it will vectorize Naturally, generally, the more pointer arithmetic/"objects" you have, the more checks are needed
2. The number of runtime memory checks doesn't actually correlate with the overhead incurred by them. I've checked locally, and a single check can have a cost from 4 to 25...
3. Why do we have this hard limit anyways? I guess because we want to avoid generating too many checks?
4. How do we come up with the current limit?

Therefore, i would like to propose to completely change the approach here,
and to instead specify the budged for said checks in terms of multiples of cost
of a single iteration of the original scalar loop.

That is, if the cost of a single iteration of the original scalar loop is 10,
and the Multiple is 2, then the budged for the runtime checks is 10*2 = 20.

Currently i have looked for the optimal value for this threshold on RawSpeed and darktable,
and the results may be interesting:
https://docs.google.com/spreadsheets/d/1b3VPU1tPYGq0AO3XH3kBv3zdpKMby8aJzFl2cLSZ5AQ/edit?usp=sharing
Just to preserve all the existing vectorizations, we'd need to allow the cost of run-time checks
to be not greater than the cost of 6 iterations of scalar loop.

I know pretty much all of the code there should vectorize, because i (re)wrote most of it.
Originally, it was just manually vectorized with SSE2, but i've added plain fallbacks.

This is motivated by the bugreport https://bugs.llvm.org/show_bug.cgi?id=44662 i have filed
almost two years ago now. The code is inspired by/based on the code by @fhahn in D75981 <https://reviews.llvm.org/D75981>,
but unfortunately that patch is rather stuck, and vectorization area of llvm appears to be
a walled garden without much outside-of-the-club contributions, with latter being busy,
so i don't have much hope here :S

  diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
  index 8a0999ddb98c..f4495cba57f5 100644
  --- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
  +++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
  @@ -8186,6 +8186,12 @@ LoopVectorizationPlanner::plan(ElementCount UserVF, unsigned UserIC,
     // Check if it is profitable to vectorize with runtime checks.
     if (SelectedVF.Width.getKnownMinValue() > 1 &&
         Requirements.getNumRuntimePointerChecks()) {
  +    errs() << "LV LAA num " << Requirements.getNumRuntimePointerChecks()
  +           << " RTCost " << Checks.getCost(CM) << " ScalarLoopCost "
  +           << SelectedVF.ScalarCost.getValue().getValue() << " fraction "
  +           << (double)Checks.getCost(CM) /
  +                  SelectedVF.ScalarCost.getValue().getValue()
  +           << "\n";
       if (Checks.getCost(CM) >
           VectorizeMemoryCheckFactor * (*SelectedVF.ScalarCost.getValue())) {
         ORE->emit([&]() {


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D109296

Files:
  llvm/include/llvm/Analysis/LoopAccessAnalysis.h
  llvm/lib/Analysis/LoopAccessAnalysis.cpp
  llvm/lib/Transforms/Scalar/LoopVersioningLICM.cpp
  llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
  llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
  llvm/test/LTO/X86/diagnostic-handler-remarks.ll
  llvm/test/Transforms/LoopVectorize/AArch64/runtime-check-size-based-threshold.ll
  llvm/test/Transforms/LoopVectorize/X86/runtime-limit.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D109296.370817.patch
Type: text/x-patch
Size: 20155 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20210905/cb9ff3fb/attachment.bin>


More information about the llvm-commits mailing list