[LLVMdev] Improving loop vectorizer support for loops with a volatile iteration variable

Wed Jul 15 12:51:37 PDT 2015

Hi all,

I would like to propose an improvement of the “almost dead” block
elimination in Transforms/Local.cpp so that it will preserve the canonical
loop form for loops with a volatile iteration variable.

*** Problem statement
Nested loops in LCALS Subset B (https://codesign.llnl.gov/LCALS.php) are
not vectorized with LLVM -O3 because the LLVM loop vectorizer fails the
test whether the loop latch and exiting block of a loop is the same. The
loops are vectorizable, and get vectorized with LLVM -O2 and also with
other commercial compilers (icc, xlc).

*** Details
These loops ended up with different loop latch and exiting block after a
series of optimizations including loop unswitching, jump threading,
simplify-the-CFG, and loop simplify. The fundamental problem here is that
the above optimizations cannot recognize a loop with a volatile iteration
variable and do not preserve its canonical loop structure.

(1) Loop unswitching generates several empty placeholder BBs only with PHI
nodes after separating out a shorter path with no inner loop execution from
a standard path.

(2) Jump threading and simplify-the-CFG passes independently calls
TryToSimplifyUnconditionalBranchFromEmptyBlock() in
Transforms/Utils/Local.cpp to get rid of almost empty BBs.

(3) TryToSimplifyUnconditionalBranchFromEmtpyBlock() eliminates the
placeholder BBs after loop unswitching and merges them into subsequent
blocks including the header of the inner loop. Before eliminating the
blocks, the function checks if the block is a loop header by looking at its
PHI nodes so that it can be saved, but the test fails with the loops with a
volatile iteration variable. The outer loop is now collapsed into the inner
loop with multiple backedges.

(4) LoopSimplify checks if a loop with multiple backedges can be separated
to nested loops by looking at PHI nodes in the loop header. The test fails
with the loops with a volatile iteration variable.

(5) LoopSimplify then creates a unique backedge block for the loop, and the
loop now has a different loop latch (the unique backedge block created in
(3)) and exiting block (a block where the volatile outer loop variable is
incremented and tested).

*** Proposed solutions
(1) Make LoopInfo available in Jump Threading and Simplify-the-CFG passes
and use LoopInfo to test whether an almost empty BB is a loop header. If
yes, do not eliminate the BB.
      Pros: Leverages existing analysis results, small code changes / Cons:
Add pass dependencies

(2) Instead of using LoopInfo, iterate through BB’s to identify backedges
and loop headers in TryToSimplifyUnconditionalBranchFromEmptyBlock() and
use the results to test if a BB is a loop header. If yes, do not eliminate
the BB. Jump Threading has functions that do similar cheap loop analysis.
      Pros: No need to depend on external analysis results / Cons: More
lines to be added

*** Summary
I would like to propose an improvement of the “almost dead” block
elimination in Transforms/Local.cpp so that it will preserve the canonical
loop form for loops with a volatile iteration variable. On top of the
current algorithm relying on PHI nodes to recognize loops, actual loop
analysis to test if a BB belongs to a loop and etc. can be used.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150715/0578e09b/attachment.html>