[LLVMdev] RFC: Loop versioning for LICM
Nema, Ashutosh
Ashutosh.Nema at amd.com
Tue Mar 3 21:28:23 PST 2015
Thanks Philip for looking into LoopVersioning.
> It's not clear to me whether you're proposing a new pass, or extending LICM
It's a new pass, and we are not changing existing LICM functionality.
LoopVersioning is a memory check based multi versioning optimization. It will creates version
of the loop with aggressive alias and the other with conservative (default) alias. Aggressive
alias version of loop will have all the memory access marked as no-alias. Because of aggressive
aliasing, alias analysis will be sure about memory accesses and it will mark no-alias for all memory
accesses in alias sets. This will help some of the optimization to proceed further. One case is LICM
PromoteAliasSets (recently refactored as promoteLoopAccessesToScalars).
Post multi versioning in same pass we are planning to call "promoteLoopAccessesToScalars"
on aggressive alias versioned loop.
> In particular, I'm concerned about the code size increase that can result from
> recursive application of loop versioning.
> I think this is an interesting idea, but the profitability heuristics will need to be
> incredibly carefully tuned.
We are also concerned about profitability heuristics, and we are trying to minimize the
negative impact of this pass. Code size increase is a concern, but we are planning to touch
the cases where gain by this optimization will be good enough to justify. For now we made
this pass optional.
Profitability analysis contains basic loop structure analysis, loop instructions analysis and
loop memory access analysis. We are also trying to ensure the new versioned loop will get
benefit by LICM. Apart from this there are few thresholds:
1) Maximum loop nest threshold allowed is 2(Planning to make this configurable).
2) Maximum number of pointers threshold in memcheck allowed is 5(Planning to make this configurable).
I understand profitability plays a vital role here, in current implementation checks added may not
be enough for a right profitability analysis. Accepting a possible scope of improvement, as suggested will
look into Sanjoy work for IRCE.
> I have not really looked at it, but one high level comment. You've got a couple of really long
> functions that need to be broken up into semantic pieces. Reviewing in the current form would be hard.
At this point I'm not expecting a formal review, just emphasizing on the functionality as a part of RFC.
Sure will split out long functions.
Regards,
Ashutosh
From: Philip Reames [mailto:listmail at philipreames.com]
Sent: Wednesday, March 04, 2015 12:16 AM
To: Nema, Ashutosh; llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] RFC: Loop versioning for LICM
On 02/26/2015 02:31 AM, Nema, Ashutosh wrote:
I like to propose a new loop multi versioning optimization for LICM.
For now I kept this for LICM only, but it can be used in multiple places.
The main motivation is to allow optimizations stuck because of memory
alias dependencies. Most of the time when alias analysis is unsure about
memory access and it says may-alias. This un surety from alias analysis restrict
some of the memory based optimizations to proceed further.
We observed some cases with LICM, where things are beyond aliasing.
In cases where alias analysis is unsure we like to use loop versioning as an alternative.
I'm concerned about this approach. LICM is deliberately somewhat conservative about aliasing to avoid compile time impact. In practice, we defer some of the harder cases to GVN in the form of full redundancy and partial redundancy elimination. It's not clear to me that versioning a loop in LICM is a good use of either compile time or code size. In particular, I'm concerned about the code size increase that can result from recursive application of loop versioning.
I think this is an interesting idea, but the profitability heuristics will need to be incredibly carefully tuned. It would be really really easy to have a net negative impact on the quality of generated code while doing something that seems like a good idea within LICM. Can you spell out the profitability heuristics you're planning on using? You might want to look at what Sanjoy has been doing with IRCE. This is a specialized form of Index Set Splitting with the roughly the same problem to solve.
Have you looked at trying to make LICM more precise with regards to aliasing? I was looking at this a while back and it seemed like there was plenty of room for cheap changes with payoff here. (One example: a read only call in a loop collapses all alias sets. Oops.)
(It's not clear to me whether you're proposing a new pass, or extending LICM. If the former, much of my concern disappears. If the later, you'll need to really justify the complexity.)
Loop Versioning will creates version of the loop with aggressive alias and the other
with conservative (default) alias. Aggressive alias version of loop will have all the
memory access marked as no-alias. These two version of loop will be preceded by a
memory runtime check. This runtime check consists of bound checks for all unique memory
accessed in loop, and it ensures aliasing of memory. Based on this check result at runtime
any of the loops gets executed, if memory is non aliased then aggressive aliasing loop
gets executed, else when memory is aliased then non aggressive aliased version gets executed.
By setting no-alias to memory accessed in aggressive alias version of loop, enable other
optimization to continue further.
Following are the top level steps:
1) Perform loop do versioning feasibility check.
2) If loop is a candidate for versioning then create a memory bound check, by considering
all the memory access in loop body.
3) Clone original loop and set all memory access as no-alias in new loop.
4) Set original loop & versioned loop as a branch target of runtime check result.
5) Call LICM on aggressive alias versioned of loop(For now LICM is scheduled later and not directly
called from LoopVersioning pass).
Consider following test:
1 int foo(int * var1, int * var2, int * var3, unsigned itr) {
2 unsigned i = 0, j = 0;
3 for(; i < itr; i++) {
4 for(; j < itr; j++) {
5 var1[j] = itr + i;
6 var3[i] = var1[j] + var3[i];
7 }
8 }
9 }
At line #6 store to var3 can be moved out by LICM(promoteLoopAccessesToScalars)
but because of alias analysis un surety about memory access it unable to move it out.
After Loop versioning IR:
<Versioned Loop>
for.body3.loopVersion: ; preds = %for.body3.loopVersion.preheader, %for.body3.loopVersion
%indvars.iv.loopVersion = phi i64 [ %indvars.iv.next.loopVersion, %for.body3.loopVersion ], [ %2, %for.body3.loopVersion.preheader ]
%arrayidx.loopVersion = getelementptr inbounds i32* %var1, i64 %indvars.iv.loopVersion
store i32 %add, i32* %arrayidx.loopVersion, align 4, !tbaa !1, !alias.scope !11, !noalias !11
%indvars.iv.next.loopVersion = add nuw nsw i64 %indvars.iv.loopVersion, 1
%lftr.wideiv.loopVersion = trunc i64 %indvars.iv.loopVersion to i32
%exitcond.loopVersion = icmp eq i32 %lftr.wideiv.loopVersion, %0
br i1 %exitcond.loopVersion, label %for.inc11.loopexit38, label %for.body3.loopVersion
<Original Loop>
for.body3: ; preds = %for.body3.lr.ph, %for.body3
%indvars.iv = phi i64 [ %indvars.iv.next, %for.body3 ], [ %2, %for.body3.lr.ph ]
%arrayidx = getelementptr inbounds i32* %var1, i64 %indvars.iv
store i32 %add, i32* %arrayidx, align 4, !tbaa !1
%8 = load i32* %arrayidx7, align 4, !tbaa !1
%add8 = add nsw i32 %8, %add
store i32 %add8, i32* %arrayidx7, align 4, !tbaa !1
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%lftr.wideiv = trunc i64 %indvars.iv to i32
%exitcond = icmp eq i32 %lftr.wideiv, %0
br i1 %exitcond, label %for.inc11, label %for.body3
In versioned loop difference is visible, 1 store has moved out.
Following are some high level details about current implementation:
- LoopVersioning
LoopVersioning is main class which holds multi versioning functionality.
- LoopVersioning :: isVersioningBeneficial
Its member to 'LoopVersioning'
Does feasibility check for loop versioning.
a) Checks layout of loop.
b) Instruction level check.
c) memory checks.
- LoopVersioning :: versionizeLoop
a) Clone original loo
b) Create a runtime memory check.
c) Add both loops under runtime check results target.
- RuntimeMemoryCheck
This class take cares runtime memory check.
- RuntimeMemoryCheck ::createRuntimeCheck
It creates runtime memory check.
In this patch used maximum loop nest threshold as 2, and maximum number
of pointers in runtime memory check as 5.
Later I like to make this as a utility so others can use it.
Requesting to go through patch for detailed approach.
Patch available at http://reviews.llvm.org/D7900
If you're expecting review, this needs to go to llvm-commits.
I have not really looked at it, but one high level comment. You've got a couple of really long functions that need to be broken up into semantic pieces. Reviewing in the current form would be hard.
Suggestions are comments are welcome.
Regards,
Ashutosh
_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu<mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150304/5cf17d6d/attachment.html>
More information about the llvm-dev
mailing list