<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-Type content="text/html; charset=utf-8"><meta name=Generator content="Microsoft Word 15 (filtered medium)"><style><!--
/* Font Definitions */
@font-face
{font-family:Helvetica;
panose-1:2 11 6 4 2 2 2 2 2 4;}
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:"Book Antiqua";
panose-1:2 4 6 2 5 3 5 3 3 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman",serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p
{mso-style-priority:99;
mso-margin-top-alt:auto;
margin-right:0cm;
mso-margin-bottom-alt:auto;
margin-left:0cm;
font-size:12.0pt;
font-family:"Times New Roman",serif;}
span.EmailStyle18
{mso-style-type:personal-reply;
font-family:"Book Antiqua",serif;
color:#943634;
font-weight:normal;
font-style:normal;
text-decoration:none none;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]--></head><body lang=EN-IE link=blue vlink=purple><div class=WordSection1><p class=MsoNormal><span style='font-family:"Book Antiqua",serif;color:black;mso-fareast-language:EN-US'>Regarding your proposed solution (2) - could this be further abstracted from just the volatile iteration variables so that all conditional dependencies between BBs could be tracked? For VLIW architectures with predication support, it is often advantageous to either clone a BB predecessor (with appropriate predication) to each of the successor BBs, or to move a dependency from successor BBs into the predecessor BBs (with appropriate predication). The approach you propose might help with this kind of ILP problem. Currently LLVM does not seem to do a great job at tracking this kind of ILP data edge dependency across BBs. I can achieve this now, but only by using a large amount of custom coding and I would prefer to use a more generalised abstraction.<o:p></o:p></span></p><p class=MsoNormal><span style='font-family:"Book Antiqua",serif;color:black;mso-fareast-language:EN-US'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-family:"Book Antiqua",serif;color:black;mso-fareast-language:EN-US'> MartinO<o:p></o:p></span></p><p class=MsoNormal><span style='font-family:"Book Antiqua",serif;color:#943634;mso-fareast-language:EN-US'><o:p> </o:p></span></p><div><div style='border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0cm 0cm 0cm'><p class=MsoNormal><b><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri",sans-serif'>From:</span></b><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri",sans-serif'> llvmdev-bounces@cs.uiuc.edu [mailto:llvmdev-bounces@cs.uiuc.edu] <b>On Behalf Of </b>Hyojin Sung<br><b>Sent:</b> 15 July 2015 20:52<br><b>To:</b> llvmdev@cs.uiuc.edu<br><b>Subject:</b> [LLVMdev] Improving loop vectorizer support for loops with a volatile iteration variable<o:p></o:p></span></p></div></div><p class=MsoNormal><o:p> </o:p></p><p><span style='font-size:10.0pt;font-family:"Helvetica",sans-serif'>Hi all,</span><br><br><span style='font-size:10.0pt;font-family:"Helvetica",sans-serif'>I would like to propose an improvement of the “almost dead” block elimination in Transforms/Local.cpp so that it will preserve the canonical loop form for loops with a volatile iteration variable.</span><br><br><span style='font-size:10.0pt;font-family:"Helvetica",sans-serif'>*** Problem statement</span><br><span style='font-size:10.0pt;font-family:"Helvetica",sans-serif'>Nested loops in LCALS Subset B (</span><a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__codesign.llnl.gov_LCALS.php&d=AwMGaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=aWKfvN4c8lvUSvVn8J0Z2ajTctlBJf0198Au28epBr0&s=4d9dt5ODcDWHHatSrwu5ZYT9ebgVzNEtpOlIR87izCM&e="><span style='font-size:10.0pt;font-family:"Helvetica",sans-serif'>https://codesign.llnl.gov/LCALS.php</span></a><span style='font-size:10.0pt;font-family:"Helvetica",sans-serif'>) are not vectorized with LLVM -O3 because the LLVM loop vectorizer fails the test whether the loop latch and exiting block of a loop is the same. The loops are vectorizable, and get vectorized with LLVM -O2 and also with other commercial compilers (icc, xlc). </span><br><br><span style='font-size:10.0pt;font-family:"Helvetica",sans-serif'>*** Details</span><br><span style='font-size:10.0pt;font-family:"Helvetica",sans-serif'>These loops ended up with different loop latch and exiting block after a series of optimizations including loop unswitching, jump threading, simplify-the-CFG, and loop simplify. The fundamental problem here is that the above optimizations cannot recognize a loop with a volatile iteration variable and do not preserve its canonical loop structure.</span><br><br><span style='font-size:10.0pt;font-family:"Helvetica",sans-serif'>(1) Loop unswitching generates several empty placeholder BBs only with PHI nodes after separating out a shorter path with no inner loop execution from a standard path. </span><br><br><span style='font-size:10.0pt;font-family:"Helvetica",sans-serif'>(2) Jump threading and simplify-the-CFG passes independently calls TryToSimplifyUnconditionalBranchFromEmptyBlock() in Transforms/Utils/Local.cpp to get rid of almost empty BBs. </span><br><br><span style='font-size:10.0pt;font-family:"Helvetica",sans-serif'>(3) TryToSimplifyUnconditionalBranchFromEmtpyBlock() eliminates the placeholder BBs after loop unswitching and merges them into subsequent blocks including the header of the inner loop. Before eliminating the blocks, the function checks if the block is a loop header by looking at its PHI nodes so that it can be saved, but the test fails with the loops with a volatile iteration variable. The outer loop is now collapsed into the inner loop with multiple backedges.</span><br><br><span style='font-size:10.0pt;font-family:"Helvetica",sans-serif'>(4) LoopSimplify checks if a loop with multiple backedges can be separated to nested loops by looking at PHI nodes in the loop header. The test fails with the loops with a volatile iteration variable. </span><br><br><span style='font-size:10.0pt;font-family:"Helvetica",sans-serif'>(5) LoopSimplify then creates a unique backedge block for the loop, and the loop now has a different loop latch (the unique backedge block created in (3)) and exiting block (a block where the volatile outer loop variable is incremented and tested). </span><br><br><span style='font-size:10.0pt;font-family:"Helvetica",sans-serif'>*** Proposed solutions</span><br><span style='font-size:10.0pt;font-family:"Helvetica",sans-serif'>(1) Make LoopInfo available in Jump Threading and Simplify-the-CFG passes and use LoopInfo to test whether an almost empty BB is a loop header. If yes, do not eliminate the BB.</span><br><span style='font-size:10.0pt;font-family:"Helvetica",sans-serif'> Pros: Leverages existing analysis results, small code changes / Cons: Add pass dependencies </span><br><br><span style='font-size:10.0pt;font-family:"Helvetica",sans-serif'>(2) Instead of using LoopInfo, iterate through BB’s to identify backedges and loop headers in TryToSimplifyUnconditionalBranchFromEmptyBlock() and use the results to test if a BB is a loop header. If yes, do not eliminate the BB. Jump Threading has functions that do similar cheap loop analysis.</span><br><span style='font-size:10.0pt;font-family:"Helvetica",sans-serif'> Pros: No need to depend on external analysis results / Cons: More lines to be added</span><br><br><span style='font-size:10.0pt;font-family:"Helvetica",sans-serif'>*** Summary</span><br><span style='font-size:10.0pt;font-family:"Helvetica",sans-serif'>I would like to propose an improvement of the “almost dead” block elimination in Transforms/Local.cpp so that it will preserve the canonical loop form for loops with a volatile iteration variable. On top of the current algorithm relying on PHI nodes to recognize loops, actual loop analysis to test if a BB belongs to a loop and etc. can be used. </span><o:p></o:p></p></div></body></html>