[PATCH] Allow PRE to duplicate loads in LICM like loop case

Mon Jan 19 16:37:46 PST 2015

Hi dberlin, resistor, nadav, bob.wilson, hfinkel, nicholas, chandlerc,

(This code is not final.  I'm posting it for discussion before cleaning it up because I suspect the approach may change based on discussion.)

The idea of this patch is to allow load PRE to 'work' in a fairly common case involving loops.  If we haven't pealed a loop, have a load in the header, one path through the loop which clobbers that location, and one or more path through the loop that does not, we can move the load into the preheader *and* the clobbering path.  (see example below)

The first point for discussion: are we okay with the duplication?  It's safe in that we haven't introduced a load along a dynamic path which previously didn't have one.  In the worst case, we add one *extra* load along a dynamic path for each time the loop runs.  (Not one per iteration, *one* in the loop preheader.)

Unfortunately, the IR presented to GVN/PRE does not make this easy.  The canonical loop form created by LoopSimplify merges all of the loop latches into a single latch, and thus single predecessor for the header block.  In order to treat certain paths differently, I had to add a mechanism to 'look through' some predecessors to get back to the original latches.  This is the 'merge block' complexity in the patch.

I'm reasonable sure at this point that the 'merge block' trick works.  It also has a side benefit in that it reduces the number of LoadPRE iterations required in some cases (e.g. you have a tree of inputs with all but one available), and thus might help compile time in some cases.

However, it does add a good amount of complexity to the LoadPRE code.  Do we think this is worthwhile?

It's worth mentioning that the 'merge block' concept is essentially a specialized form of jump threading.  In fact, running '-jump-threading' does help in the loop cases I've looked at.  However, running jump threading also inhibits some optimizations currently performed by LoadPRE.  If we have a merge point where both inputs are unavailable, PRE will currently insert a single load at the merge.  Running jump threading breaks this.  Keeping that case working as it does today is why we only look through merge blocks with a single unavailable predeccesor in the patch.  

I'm currently of the belief that the complexity is worthwhile.  I'm open to being convinced that either a) it's not, or b) there's a better approach.

Example (before, after loop simplify):
declare void @hold(i32) readonly
declare void @clobber()
define i32 @test1(i1 %cnd, i32* dereferenceable(400) %p) {
entry: 
  br label %header

header:
  %v1 = load i32* %p
  call void @hold(i32 %v1)
  br i1 %cnd, label %bb1, label %bb2

bb1:
  br label %merge

bb2:
  call void @clobber()
  br label %merge

merge:
  br label %header
}

Example (after):
define i32 @test1(i1 %cnd, i32* dereferenceable(400) %p) {
entry:
  %v1.pre = load i32* %p
  br label %header

header:                                           ; preds = %merge, %entry
  %v1 = phi i32 [ %v12, %merge ], [ %v1.pre, %entry ]
  call void @hold(i32 %v1)
  br i1 %cnd, label %bb1, label %bb2

bb1:                                              ; preds = %header
  br label %merge

bb2:                                              ; preds = %header
  call void @clobber()
  %v1.pre1 = load i32* %p
  br label %merge

merge:                                            ; preds = %bb2, %bb1
  %v12 = phi i32 [ %v1.pre1, %bb2 ], [ %v1, %bb1 ]
  br label %header
}

http://reviews.llvm.org/D7061

Files:
  lib/Transforms/Scalar/GVN.cpp

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D7061.18405.patch
Type: text/x-patch
Size: 8849 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150120/874bfefb/attachment.bin>