<html><head><style type='text/css'>p { margin: 0; }</style></head><body><div style='font-family: arial,helvetica,sans-serif; font-size: 10pt; color: #000000'><br><hr id="zwchr"><blockquote style="border-left: 2px solid rgb(16, 16, 255); margin-left: 5px; padding-left: 5px; color: rgb(0, 0, 0); font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica,Arial,sans-serif; font-size: 12pt;"><b>From: </b>"Ashutosh Nema" <Ashutosh.Nema@amd.com><br><b>To: </b>llvmdev@cs.uiuc.edu<br><b>Sent: </b>Thursday, February 26, 2015 4:31:18 AM<br><b>Subject: </b>[LLVMdev] RFC: Loop versioning for LICM<br><br>
<style><!--
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:#954F72;
text-decoration:underline;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
{mso-style-priority:34;
margin-top:0in;
margin-right:0in;
margin-bottom:0in;
margin-left:.5in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri",sans-serif;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
@list l0
{mso-list-id:1066536356;
mso-list-type:hybrid;
mso-list-template-ids:651046860 67698711 67698713 67698715 67698703 67698713 67698715 67698703 67698713 67698715;}
@list l0:level1
{mso-level-number-format:alpha-lower;
mso-level-text:"%1\)";
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level2
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level3
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l0:level4
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level5
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level6
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l0:level7
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level8
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level9
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
ol
{margin-bottom:0in;}
ul
{margin-bottom:0in;}
--></style>
<div class="WordSection1">
<p class="MsoNormal">I like to propose a new loop multi versioning optimization for LICM.
</p>
<p class="MsoNormal">For now I kept this for LICM only, but it can be used in multiple places.</p>
<p class="MsoNormal">The main motivation is to allow optimizations stuck because of memory
</p>
<p class="MsoNormal">alias dependencies. Most of the time when alias analysis is unsure about
</p>
<p class="MsoNormal">memory access and it says may-alias. This un surety from alias analysis restrict
</p>
<p class="MsoNormal">some of the memory based optimizations to proceed further.</p>
<p class="MsoNormal">We observed some cases with LICM, where things are beyond aliasing.
</p>
<p class="MsoNormal">In cases where alias analysis is unsure we like to use loop versioning as an alternative.</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">Loop Versioning will creates version of the loop with aggressive alias and the other
</p>
<p class="MsoNormal">with conservative (default) alias. Aggressive alias version of loop will have all the
</p>
<p class="MsoNormal">memory access marked as no-alias. These two version of loop will be preceded by a
</p>
<p class="MsoNormal">memory runtime check. This runtime check consists of bound checks for all unique memory
</p>
<p class="MsoNormal">accessed in loop, and it ensures aliasing of memory. Based on this check result at runtime
</p>
<p class="MsoNormal">any of the loops gets executed, if memory is non aliased then aggressive aliasing loop
</p>
<p id="DWT1967" class="MsoNormal">gets executed, else when memory is aliased then non aggressive aliased version gets executed.</p></div></blockquote><br>Hi <span style="font-size: 12pt; font-family: "Times New Roman",serif;">Ashutosh,<br>
<br>
I think this is a really interesting idea, and I'd like to encourage you to continue working on it. Regarding profitability, I think you'll want to check that you'll be able to hoist/sink a significant number of the memory accesses inside the new "versioned" loop. If a loop has a large number of accesses, and the new loop body only differs by the hosting/sinking of a few, then you're unlikely to see the difference.<br><br> -Hal</span><br><br><blockquote style="border-left: 2px solid rgb(16, 16, 255); margin-left: 5px; padding-left: 5px; color: rgb(0, 0, 0); font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica,Arial,sans-serif; font-size: 12pt;"><div class="WordSection1"><p class="MsoNormal"></p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">By setting no-alias to memory accessed in aggressive alias version of loop, enable other
</p>
<p class="MsoNormal">optimization to continue further.</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">Following are the top level steps:</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">1) Perform loop do versioning feasibility check.</p>
<p class="MsoNormal">2) If loop is a candidate for versioning then create a memory bound check, by considering
</p>
<p class="MsoNormal"> all the memory access in loop body.</p>
<p class="MsoNormal">3) Clone original loop and set all memory access as no-alias in new loop.</p>
<p class="MsoNormal">4) Set original loop & versioned loop as a branch target of runtime check result.</p>
<p class="MsoNormal">5) Call LICM on aggressive alias versioned of loop(For now LICM is scheduled later and not directly
</p>
<p class="MsoNormal"> called from LoopVersioning pass).</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">Consider following test:</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal"> 1 int foo(int * var1, int * var2, int * var3, unsigned itr) {</p>
<p class="MsoNormal"> 2 unsigned i = 0, j = 0;</p>
<p class="MsoNormal"> 3 for(; i < itr; i++) {</p>
<p class="MsoNormal"> 4 for(; j < itr; j++) {</p>
<p class="MsoNormal"> 5 var1[j] = itr + i;</p>
<p class="MsoNormal"> 6 var3[i] = var1[j] + var3[i];</p>
<p class="MsoNormal"> 7 }</p>
<p class="MsoNormal"> 8 }</p>
<p class="MsoNormal"> 9 }</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">At line #6 store to var3 can be moved out by LICM(promoteLoopAccessesToScalars)
</p>
<p class="MsoNormal">but because of alias analysis un surety about memory access it unable to move it out.</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">After Loop versioning IR:</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal"><b><Versioned Loop></b></p>
<p class="MsoNormal">for.body3.loopVersion: ; preds = %for.body3.loopVersion.preheader, %for.body3.loopVersion</p>
<p class="MsoNormal"> %indvars.iv.loopVersion = phi i64 [ %indvars.iv.next.loopVersion, %for.body3.loopVersion ], [ %2, %for.body3.loopVersion.preheader ]</p>
<p class="MsoNormal"> %arrayidx.loopVersion = getelementptr inbounds i32* %var1, i64 %indvars.iv.loopVersion</p>
<p class="MsoNormal"> store i32 %add, i32* %arrayidx.loopVersion, align 4, !tbaa !1, !alias.scope !11, !noalias !11</p>
<p class="MsoNormal"> %indvars.iv.next.loopVersion = add nuw nsw i64 %indvars.iv.loopVersion, 1</p>
<p class="MsoNormal"> %lftr.wideiv.loopVersion = trunc i64 %indvars.iv.loopVersion to i32</p>
<p class="MsoNormal"> %exitcond.loopVersion = icmp eq i32 %lftr.wideiv.loopVersion, %0</p>
<p class="MsoNormal"> br i1 %exitcond.loopVersion, label %for.inc11.loopexit38, label %for.body3.loopVersion</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal"><b><Original Loop></b></p>
<p class="MsoNormal">for.body3: ; preds = %for.body3.lr.ph, %for.body3</p>
<p class="MsoNormal"> %indvars.iv = phi i64 [ %indvars.iv.next, %for.body3 ], [ %2, %for.body3.lr.ph ]</p>
<p class="MsoNormal"> %arrayidx = getelementptr inbounds i32* %var1, i64 %indvars.iv</p>
<p class="MsoNormal"> store i32 %add, i32* %arrayidx, align 4, !tbaa !1</p>
<p class="MsoNormal"> %8 = load i32* %arrayidx7, align 4, !tbaa !1</p>
<p class="MsoNormal"> %add8 = add nsw i32 %8, %add</p>
<p class="MsoNormal"> <span style="background: none repeat scroll 0% 0% silver;">store i32 %add8, i32* %arrayidx7, align 4, !tbaa !1</span></p>
<p class="MsoNormal"> %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1</p>
<p class="MsoNormal"> %lftr.wideiv = trunc i64 %indvars.iv to i32</p>
<p class="MsoNormal"> %exitcond = icmp eq i32 %lftr.wideiv, %0</p>
<p class="MsoNormal"> br i1 %exitcond, label %for.inc11, label %for.body3</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">In versioned loop difference is visible, 1 store has moved out.</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">Following are some high level details about current implementation:</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">- LoopVersioning</p>
<p class="MsoNormal">LoopVersioning is main class which holds multi versioning functionality.</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">- LoopVersioning :: isVersioningBeneficial </p>
<p class="MsoNormal">Its member to ‘LoopVersioning’</p>
<p class="MsoNormal">Does feasibility check for loop versioning. </p>
<p class="MsoNormal">a) Checks layout of loop.</p>
<p class="MsoNormal">b) Instruction level check.</p>
<p class="MsoNormal">c) memory checks.</p>
<p class="MsoNormal" style="margin-left: 0.25in;"> </p>
<p class="MsoNormal">- LoopVersioning :: versionizeLoop</p>
<p class="MsoNormal">a) Clone original loo</p>
<p class="MsoNormal">b) Create a runtime memory check.</p>
<p class="MsoNormal">c) Add both loops under runtime check results target.</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">- RuntimeMemoryCheck</p>
<p class="MsoNormal">This class take cares runtime memory check.</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">- RuntimeMemoryCheck ::createRuntimeCheck</p>
<p class="MsoNormal">It creates runtime memory check.</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">In this patch used maximum loop nest threshold as 2, and maximum number
</p>
<p class="MsoNormal">of pointers in runtime memory check as 5.</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">Later I like to make this as a utility so others can use it.</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">Requesting to go through patch for detailed approach.</p>
<p class="MsoNormal">Patch available at <a href="http://reviews.llvm.org/D7900" target="_blank">http://reviews.llvm.org/D7900</a></p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">Suggestions are comments are welcome.</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">Regards,</p>
<p class="MsoNormal">Ashutosh</p>
</div>
<br>_______________________________________________<br>LLVM Developers mailing list<br>LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu<br>http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev<br></blockquote><br><br><br>-- <br><div><span name="x"></span>Hal Finkel<br>Assistant Computational Scientist<br>Leadership Computing Facility<br>Argonne National Laboratory<span name="x"></span><br></div></div></body></html>