[LLVMdev] RFC: Loop versioning for LICM

Philip Reames listmail at philipreames.com
Tue Mar 3 10:46:07 PST 2015


On 02/26/2015 02:31 AM, Nema, Ashutosh wrote:
>
> I like to propose a new loop multi versioning optimization for LICM.
>
> For now I kept this for LICM only, but it can be used in multiple places.
>
> The main motivation is to allow optimizations stuck because of memory
>
> alias dependencies. Most of the time when alias analysis is unsure about
>
> memory access and it says may-alias. This un surety from alias 
> analysis restrict
>
> some of the memory based optimizations to proceed further.
>
> We observed some cases with LICM, where things are beyond aliasing.
>
> In cases where alias analysis is unsure we like to use loop versioning 
> as an alternative.
>
I'm concerned about this approach.  LICM is deliberately somewhat 
conservative about aliasing to avoid compile time impact.  In practice, 
we defer some of the harder cases to GVN in the form of full redundancy 
and partial redundancy elimination.  It's not clear to me that 
versioning a loop in LICM is a good use of either compile time or code 
size.  In particular, I'm concerned about the code size increase that 
can result from recursive application of loop versioning.

I think this is an interesting idea, but the profitability heuristics 
will need to be incredibly carefully tuned.  It would be really really 
easy to have a net negative impact on the quality of generated code 
while doing something that seems like a good idea within LICM.  Can you 
spell out the profitability heuristics you're planning on using?  You 
might want to look at what Sanjoy has been doing with IRCE.  This is a 
specialized form of Index Set Splitting with the roughly the same 
problem to solve.

Have you looked at trying to make LICM more precise with regards to 
aliasing?  I was looking at this a while back and it seemed like there 
was plenty of room for cheap changes with payoff here.  (One example: a 
read only call in a loop collapses all alias sets. Oops.)

(It's not clear to me whether you're proposing a new pass, or extending 
LICM.  If the former, much of my concern disappears.  If the later, 
you'll need to really justify the complexity.)

> Loop Versioning will creates version of the loop with aggressive alias 
> and the other
>
> with conservative (default) alias. Aggressive alias version of loop 
> will have all the
>
> memory access marked as no-alias. These two version of loop will be 
> preceded by a
>
> memory runtime check. This runtime check consists of bound checks for 
> all unique memory
>
> accessed in loop, and it ensures aliasing of memory. Based on this 
> check result at runtime
>
> any of the loops gets executed, if memory is non aliased then 
> aggressive aliasing loop
>
> gets executed, else when memory is aliased then non aggressive aliased 
> version gets executed.
>
> By setting no-alias to memory accessed in aggressive alias version of 
> loop, enable other
>
> optimization to continue further.
>
> Following are the top level steps:
>
> 1) Perform loop do versioning feasibility check.
>
> 2) If loop is a candidate for versioning then create a memory bound 
> check, by considering
>
>      all the memory access in loop body.
>
> 3) Clone original loop and set all memory access as no-alias in new loop.
>
> 4) Set original loop & versioned loop as a branch target of runtime 
> check result.
>
> 5) Call LICM on aggressive alias versioned of loop(For now LICM is 
> scheduled later and not directly
>
>      called from LoopVersioning pass).
>
> Consider following test:
>
>      1  int foo(int * var1, int * var2, int * var3, unsigned itr) {
>
>      2    unsigned i = 0, j = 0;
>
>      3    for(; i < itr; i++) {
>
>      4      for(; j < itr; j++) {
>
>      5        var1[j] = itr + i;
>
>      6        var3[i] = var1[j] + var3[i];
>
>      7      }
>
>      8    }
>
>      9  }
>
> At line #6 store to var3 can be moved out by 
> LICM(promoteLoopAccessesToScalars)
>
> but because of alias analysis un surety about memory access it unable 
> to move it out.
>
> After Loop versioning IR:
>
> *<Versioned Loop>*
>
> for.body3.loopVersion: ; preds = %for.body3.loopVersion.preheader, 
> %for.body3.loopVersion
>
>   %indvars.iv.loopVersion = phi i64 [ %indvars.iv.next.loopVersion, 
> %for.body3.loopVersion ], [ %2, %for.body3.loopVersion.preheader ]
>
>   %arrayidx.loopVersion = getelementptr inbounds i32* %var1, i64 
> %indvars.iv.loopVersion
>
>   store i32 %add, i32* %arrayidx.loopVersion, align 4, !tbaa !1, 
> !alias.scope !11, !noalias !11
>
>   %indvars.iv.next.loopVersion = add nuw nsw i64 
> %indvars.iv.loopVersion, 1
>
>   %lftr.wideiv.loopVersion = trunc i64 %indvars.iv.loopVersion to i32
>
>   %exitcond.loopVersion = icmp eq i32 %lftr.wideiv.loopVersion, %0
>
>   br i1 %exitcond.loopVersion, label %for.inc11.loopexit38, label 
> %for.body3.loopVersion
>
> *<Original Loop>*
>
> for.body3: ; preds = %for.body3.lr.ph, %for.body3
>
>   %indvars.iv = phi i64 [ %indvars.iv.next, %for.body3 ], [ %2, 
> %for.body3.lr.ph ]
>
>   %arrayidx = getelementptr inbounds i32* %var1, i64 %indvars.iv
>
>   store i32 %add, i32* %arrayidx, align 4, !tbaa !1
>
>   %8 = load i32* %arrayidx7, align 4, !tbaa !1
>
>   %add8 = add nsw i32 %8, %add
>
> store i32 %add8, i32* %arrayidx7, align 4, !tbaa !1
>
>   %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
>
>   %lftr.wideiv = trunc i64 %indvars.iv to i32
>
>   %exitcond = icmp eq i32 %lftr.wideiv, %0
>
>   br i1 %exitcond, label %for.inc11, label %for.body3
>
> In versioned loop difference is visible, 1 store has moved out.
>
> Following are some high level details about current implementation:
>
> -  LoopVersioning
>
> LoopVersioning is main class which holds multi versioning functionality.
>
> - LoopVersioning :: isVersioningBeneficial
>
> Its member to ‘LoopVersioning’
>
> Does feasibility check for loop versioning.
>
> a) Checks layout of loop.
>
> b) Instruction level check.
>
> c) memory checks.
>
> - LoopVersioning :: versionizeLoop
>
> a) Clone original loo
>
> b) Create a runtime memory check.
>
> c) Add both loops under runtime check results target.
>
> - RuntimeMemoryCheck
>
> This class take cares runtime memory check.
>
> - RuntimeMemoryCheck ::createRuntimeCheck
>
> It creates runtime memory check.
>
> In this patch used maximum loop nest threshold as 2, and maximum number
>
> of pointers in runtime memory check as 5.
>
> Later I like to make this as a utility so others can use it.
>
> Requesting to go through patch for detailed approach.
>
> Patch available at http://reviews.llvm.org/D7900
>
If you're expecting review, this needs to go to llvm-commits.

I have not really looked at it, but one high level comment.  You've got 
a couple of really long functions that need to be broken up into 
semantic pieces.  Reviewing in the current form would be hard.
>
> Suggestions are comments are welcome.
>
> Regards,
>
> Ashutosh
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150303/ce5bbe53/attachment.html>


More information about the llvm-dev mailing list