[LLVMdev] RFC: Loop versioning for LICM

Nema, Ashutosh Ashutosh.Nema at amd.com
Mon May 11 20:49:00 PDT 2015


Thanks Hal, Philip and Adam for reviewing this RFC.

Based on suggestion I have made below changes.
Code change available at: http://reviews.llvm.org/D9151

1) Using LAA(Loop Access Analysis) for versioning feasibility & memcheck.
As the part of feasibility analysis started using LAA.
Earlier memcheck creation was implemented under Loop Versioning, now using
LAA for this.

2) Improved benefit analysis.
Introduced few checks to ensure the benefit of loop versioning.
- Checking number of possible invariant promotion vs total instructions of loop.
If this goes beyond certain threshold then only do loop versioning.
Kept this threshold as a hard value as 25%(total instruction of loop), also
provided option to control this threshold.

- Similar added check for invariant stores vs total stores in loop, kept this
threshold as a hard value 40%(total instruction of loop), also provided option
to control this threshold.

3) Introduced threshold command line parameter to control loop nest,
benefit threshold etc.

- loopver-invar-store-threshold
This option is to control invariant store vs total store threshold,
default 40% of total instruction.

- loopver-invar-threshold
This option is to control invariant load & store vs total instruction threshold,
default 25% of total instruction.

- loopver-max-depth-threshold
This option is to control maximum depth for loops getting considered,
default depth is 2.

- loopver-memcheck-ptr-threshold
This option is to control maximum pointer/address getting considered, default is 5.

4) Code refactoring

5) Added test cases.

If possible, please review this change.

Regards,
Ashutosh


From: Nema, Ashutosh
Sent: Thursday, February 26, 2015 4:01 PM
To: llvmdev at cs.uiuc.edu
Subject: RFC: Loop versioning for LICM

I like to propose a new loop multi versioning optimization for LICM.
For now I kept this for LICM only, but it can be used in multiple places.
The main motivation is to allow optimizations stuck because of memory
alias dependencies. Most of the time when alias analysis is unsure about
memory access and it says may-alias. This un surety from alias analysis restrict
some of the memory based optimizations to proceed further.
We observed some cases with LICM, where things are beyond aliasing.
In cases where alias analysis is unsure we like to use loop versioning as an alternative.

Loop Versioning will creates version of the loop with aggressive alias and the other
with conservative (default) alias. Aggressive alias version of loop will have all the
memory access marked as no-alias. These two version of loop will be preceded by a
memory runtime check. This runtime check consists of bound checks for all unique memory
accessed in loop, and it ensures aliasing of memory. Based on this check result at runtime
any of the loops gets executed, if memory is non aliased then aggressive aliasing loop
gets executed, else when memory is aliased then non aggressive aliased version gets executed.

By setting no-alias to memory accessed in aggressive alias version of loop, enable other
optimization to continue further.

Following are the top level steps:

1) Perform loop do versioning feasibility check.
2) If loop is a candidate for versioning then create a memory bound check, by considering
     all the memory access in loop body.
3) Clone original loop and set all memory access as no-alias in new loop.
4) Set original loop & versioned loop as a branch target of runtime check result.
5) Call LICM on aggressive alias versioned of loop(For now LICM is scheduled later and not directly
     called from LoopVersioning pass).

Consider following test:

     1  int foo(int * var1, int * var2, int * var3, unsigned itr) {
     2    unsigned i = 0, j = 0;
     3    for(; i < itr; i++) {
     4      for(; j < itr; j++) {
     5        var1[j] = itr + i;
     6        var3[i] = var1[j] + var3[i];
     7      }
     8    }
     9  }

At line #6 store to var3 can be moved out by LICM(promoteLoopAccessesToScalars)
but because of alias analysis un surety about memory access it unable to move it out.

After Loop versioning IR:

<Versioned Loop>
for.body3.loopVersion:                            ; preds = %for.body3.loopVersion.preheader, %for.body3.loopVersion
  %indvars.iv.loopVersion = phi i64 [ %indvars.iv.next.loopVersion, %for.body3.loopVersion ], [ %2, %for.body3.loopVersion.preheader ]
  %arrayidx.loopVersion = getelementptr inbounds i32* %var1, i64 %indvars.iv.loopVersion
  store i32 %add, i32* %arrayidx.loopVersion, align 4, !tbaa !1, !alias.scope !11, !noalias !11
  %indvars.iv.next.loopVersion = add nuw nsw i64 %indvars.iv.loopVersion, 1
  %lftr.wideiv.loopVersion = trunc i64 %indvars.iv.loopVersion to i32
  %exitcond.loopVersion = icmp eq i32 %lftr.wideiv.loopVersion, %0
  br i1 %exitcond.loopVersion, label %for.inc11.loopexit38, label %for.body3.loopVersion

<Original Loop>
for.body3:                                        ; preds = %for.body3.lr.ph, %for.body3
  %indvars.iv = phi i64 [ %indvars.iv.next, %for.body3 ], [ %2, %for.body3.lr.ph ]
  %arrayidx = getelementptr inbounds i32* %var1, i64 %indvars.iv
  store i32 %add, i32* %arrayidx, align 4, !tbaa !1
  %8 = load i32* %arrayidx7, align 4, !tbaa !1
  %add8 = add nsw i32 %8, %add
  store i32 %add8, i32* %arrayidx7, align 4, !tbaa !1
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  %lftr.wideiv = trunc i64 %indvars.iv to i32
  %exitcond = icmp eq i32 %lftr.wideiv, %0
  br i1 %exitcond, label %for.inc11, label %for.body3

In versioned loop difference is visible, 1 store has moved out.

Following are some high level details about current implementation:

-  LoopVersioning
LoopVersioning is main class which holds multi versioning functionality.

- LoopVersioning :: isVersioningBeneficial
Its member to 'LoopVersioning'
Does feasibility check for loop versioning.
a) Checks layout of loop.
b) Instruction level check.
c) memory checks.

- LoopVersioning :: versionizeLoop
a) Clone original loo
b) Create a runtime memory check.
c) Add both loops under runtime check results target.

- RuntimeMemoryCheck
This class take cares runtime memory check.

- RuntimeMemoryCheck ::createRuntimeCheck
It creates runtime memory check.

In this patch used maximum loop nest threshold as 2, and maximum number
of pointers in runtime memory check as 5.

Later I like to make this as a utility so others can use it.

Requesting to go through patch for detailed approach.
Patch available at http://reviews.llvm.org/D7900

Suggestions are comments are welcome.

Regards,
Ashutosh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150512/d03d4010/attachment.html>


More information about the llvm-dev mailing list