[PATCH] D101290: [lV] Try to sink and hoist inside candidate loops for vectorization.

Mon Apr 26 05:50:26 PDT 2021

fhahn created this revision.
fhahn added reviewers: Ayal, anemet, hfinkel, spatel, lebedev.ri.
Herald added subscribers: pengfei, rogfer01, bollu, hiraditya, kristof.beyls.
fhahn requested review of this revision.
Herald added a subscriber: vkmr.
Herald added a project: LLVM.

Hoisting and sinking instructions out of conditional blocks enables
additional vectorization by:

1. Executing memory accesses unconditionally.
2. Reducing the number of instructions that need predication.

After disabling early hoisting / sinking, we miss out on a few
vectorization opportunities. One of those is causing a ~10% performance
regression in one of the Geekbench benchmarks on AArch64.

This patch tires to recover the regression by running hoisting/sinking
inside each inner loop before vectorization. This is not ideal, because
we also hoist/sink in loops that won't be vectorized. But LV already
does similar transformations for all inner loops (e.g LoopSimplify and
LCSSA construction). Alternatively we could run a separate
loop-sink-hoist pass, but I am not sure that's worth the effort.

In the long term, the sinking/hoisting could and should be done in
VPlan, but it requires at least handling parts of legaltiy and
cost-modeling in VPlan as well.

Details about the impact on compile-time can be found here:
http://llvm-compile-time-tracker.com/compare.php?from=3a71d0de397e3a15c943ca59a00243ba8b7154da&to=c4efd69f4733b46e5de8fc2fa6e4c2495750d339&stat=instructions

NewPM-O3: geoman +0.18%
NewPM-ReleaseThinLTO: geoman +0.17%
NewPM-ReleaseLTO-g: geoman +0.18%

In terms of number of loops vectorized, we have the following changes
across MultiSource/SPEC2000/SPEC2006 on X86 with LTO

test-suite...000/186.crafty/186.crafty.test    20.00  22.00 10.0%
 test-suite...006/450.soplex/450.soplex.test    85.00  86.00  1.2%
 test-suite.../CINT2006/403.gcc/403.gcc.test   209.00 211.00  1.0%
 test-suite...6/464.h264ref/464.h264ref.test   156.00 157.00  0.6%
 test-suite...ications/JM/lencod/lencod.test   215.00 216.00  0.5%

And +0.5% more loops are vectorized in Geekbench on AArch64.

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D101290

Files:
  llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
  llvm/test/Transforms/LoopVectorize/AArch64/prepare-hoist-sink.ll
  llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll
  llvm/test/Transforms/LoopVectorize/invariant-store-vectorization.ll
  llvm/test/Transforms/LoopVectorize/tail-folding-counting-down.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D101290.340497.patch
Type: text/x-patch
Size: 29130 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20210426/6605052b/attachment.bin>