[PATCH] D13087: A fix for loop vectorizer with handling loops with volatile induction variables

Tue Sep 22 20:16:17 PDT 2015

hsung created this revision.
hsung added reviewers: hfinkel, chandlerc, carlo.bertolli, sfantao.
hsung added a subscriber: llvm-commits.
hsung set the repository for this revision to rL LLVM.
hsung changed the visibility of this Differential Revision from "Public (No Login Required)" to "All Users".

Hi,

I am proposing a patch to improve loop vectorizer performance with nested loops with volatile induction variables. The root of the problem is that LLVM checks PHI nodes only to identify potential loops when deciding (not) to apply optimizations, fails to recognize a loop with a volatile induction variable, and destroys vectorizable nested loops and collapses them before the loop vectorizer executes. We can fix the problem by either (1) preventing loops from being collapsed in the first place or (2) updating loop vectorizer or other transformation passes to handle collapsed loops. 
The attached patch is a fix for (1). Jump Threading and Simplify-the-CFG eliminate "almost empty" BB's with PHI nodes only. They simply check PHI nodes if the BB is a potential loop header not to eliminate the block. The patch augments the existing check by testing if the BB actually belongs to a set of loop headers and not eliminating it if yes. Jump Threading already builds a set of loop headers by identifying backedges, so we can simply reuse the set. For Simplify-the-CFG, such a set of loop headers can be created per function in iterativelySimplifyCFG().
The patch fails to pass four tests for LoopUnswitch and SimplifyCFG. Two failed tests for LoopUnswitch are simply due to BB name differences and can be trivially resolved. Two failed tests for SimplifyCFG include two nested loops with an empty outer loop header which were supposed to be merged with the inner loop but not any more after the patch. I propose changes to these tests too. First, keeping the canonical loop form for these nested loops offers greater benefit for later optimizations including loop vectorization. Second, the empty blocks will get eliminated when SimplifyCFG is executed again later in the back-end. Therefore, extra branches in test IR's will not affect final codes. 

Repository:
  rL LLVM

http://reviews.llvm.org/D13087

Files:
  include/llvm/Transforms/Utils/Local.h
  lib/Transforms/Scalar/JumpThreading.cpp
  lib/Transforms/Scalar/SimplifyCFGPass.cpp
  lib/Transforms/Utils/SimplifyCFG.cpp
  test/Transforms/LoopUnswitch/2015-06-17-Metadata.ll
  test/Transforms/LoopUnswitch/infinite-loop.ll
  test/Transforms/SimplifyCFG/2008-05-16-PHIBlockMerge.ll
  test/Transforms/SimplifyCFG/EqualPHIEdgeBlockMerge.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D13087.35457.patch
Type: text/x-patch
Size: 14149 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150923/f873fe6b/attachment.bin>