[LLVMdev] Improving loop vectorizer support for loops with a volatile iteration variable

Thu Jul 16 07:18:27 PDT 2015

From:	Chandler Carruth <chandlerc at google.com>
To:	Hyojin Sung/Watson/IBM at IBMUS, llvmdev at cs.uiuc.edu
Date:	07/15/2015 08:35 PM
Subject:	Re: [LLVMdev] Improving loop vectorizer support for loops with
            a volatile iteration variable

On Wed, Jul 15, 2015 at 12:55 PM Hyojin Sung <hsung at us.ibm.com> wrote:
  Hi all,

  I would like to propose an improvement of the “almost dead” block
  elimination in Transforms/Local.cpp so that it will preserve the
  canonical loop form for loops with a volatile iteration variable.

  *** Problem statement
  Nested loops in LCALS Subset B (https://codesign.llnl.gov/LCALS.php) are
  not vectorized with LLVM -O3 because the LLVM loop vectorizer fails the
  test whether the loop latch and exiting block of a loop is the same. The
  loops are vectorizable, and get vectorized with LLVM -O2

I would be interested to know why -O2 succeeds here.

-O2 does not perform loop unswitching which creates artificial empty
placeholder blocks in the outer loop. As long as incrementing and testing
the volatile iteration variable is kept only in the original BB, the block
does not get eliminated.

  and also with other commercial compilers (icc, xlc).

  *** Details
  These loops ended up with different loop latch and exiting block after a
  series of optimizations including loop unswitching, jump threading,
  simplify-the-CFG, and loop simplify. The fundamental problem here is that
  the above optimizations cannot recognize a loop with a volatile iteration
  variable and do not preserve its canonical loop structure.

Ok, meta-level question first:

Why do we care about performance of loops with a volatile iteration
variable? That seems both counter-intuitive and unlikely to be a useful
goal. We simply don't optimize volatile operations well in *any* part of
the optimizer, and I'm not sure why we need to start trying to fix that.
This seems like an irreparably broken benchmark, but perhaps there is a
motivation I don't yet see.

Assuming that sufficient motivation arises to try to fix this, see my
comments below:

  (1) Loop unswitching generates several empty placeholder BBs only with
  PHI nodes after separating out a shorter path with no inner loop
  execution from a standard path.

  (2) Jump threading and simplify-the-CFG passes independently calls
  TryToSimplifyUnconditionalBranchFromEmptyBlock() in
  Transforms/Utils/Local.cpp to get rid of almost empty BBs.

  (3) TryToSimplifyUnconditionalBranchFromEmtpyBlock() eliminates the
  placeholder BBs after loop unswitching and merges them into subsequent
  blocks including the header of the inner loop. Before eliminating the
  blocks, the function checks if the block is a loop header by looking at
  its PHI nodes so that it can be saved, but the test fails with the loops
  with a volatile iteration variable.

Why does this fail for a volatile iteration variable but not for a
non-volatile one? I think understanding that will be key to understanding
how it should be fixed.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150716/670d692e/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150716/670d692e/attachment.gif>