<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]--><style><!--
/* Font Definitions */
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:Consolas;
        panose-1:2 11 6 9 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman",serif;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
pre
        {mso-style-priority:99;
        mso-style-link:"HTML Preformatted Char";
        margin:0in;
        margin-bottom:.0001pt;
        font-size:10.0pt;
        font-family:"Courier New";}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
        {mso-style-priority:34;
        margin-top:0in;
        margin-right:0in;
        margin-bottom:0in;
        margin-left:.5in;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman",serif;}
p.m-8565914691181255471m2339803851957541712msolistparagraph, li.m-8565914691181255471m2339803851957541712msolistparagraph, div.m-8565914691181255471m2339803851957541712msolistparagraph
        {mso-style-name:m_-8565914691181255471m_2339803851957541712msolistparagraph;
        mso-margin-top-alt:auto;
        margin-right:0in;
        mso-margin-bottom-alt:auto;
        margin-left:0in;
        font-size:12.0pt;
        font-family:"Times New Roman",serif;}
span.HTMLPreformattedChar
        {mso-style-name:"HTML Preformatted Char";
        mso-style-priority:99;
        mso-style-link:"HTML Preformatted";
        font-family:Consolas;}
span.EmailStyle20
        {mso-style-type:personal-reply;
        font-family:"Calibri",sans-serif;
        color:#1F497D;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-family:"Calibri",sans-serif;}
@page WordSection1
        {size:8.5in 11.0in;
        margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
        {page:WordSection1;}
/* List Definitions */
@list l0
        {mso-list-id:1344477072;
        mso-list-type:hybrid;
        mso-list-template-ids:-173931108 67698711 67698713 67698715 67698703 67698713 67698715 67698703 67698713 67698715;}
@list l0:level1
        {mso-level-number-format:alpha-lower;
        mso-level-text:"%1\)";
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l0:level2
        {mso-level-number-format:alpha-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l0:level3
        {mso-level-number-format:roman-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:right;
        text-indent:-9.0pt;}
@list l0:level4
        {mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l0:level5
        {mso-level-number-format:alpha-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l0:level6
        {mso-level-number-format:roman-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:right;
        text-indent:-9.0pt;}
@list l0:level7
        {mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l0:level8
        {mso-level-number-format:alpha-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l0:level9
        {mso-level-number-format:roman-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:right;
        text-indent:-9.0pt;}
ol
        {margin-bottom:0in;}
ul
        {margin-bottom:0in;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal" style="margin-left:.5in"><b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> Michael Kuperstein [mailto:mkuper@google.com]
<br>
<b>Sent:</b> Tuesday, March 14, 2017 10:29 PM<br>
<b>To:</b> Nema, Ashutosh <Ashutosh.Nema@amd.com><br>
<b>Cc:</b> Adam Nemet <anemet@apple.com>; Hal Finkel <hfinkel@anl.gov>; Renato Golin <renato.golin@linaro.org>; llvm-dev <llvm-dev@lists.llvm.org>; Mehdi Amini <mehdi.amini@apple.com>; Daniel Berlin <dberlin@dberlin.org>; Zaks, Ayal <ayal.zaks@intel.com><br>
<b>Subject:</b> RE: [llvm-dev] [Proposal][RFC] Epilog loop vectorization<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:.5in"><o:p> </o:p></p>
<div>
<div>
<p class="MsoNormal" style="margin-left:.5in">I'm still not sure about this, for a few reasons:<o:p></o:p></p>
<div>
<p class="MsoNormal" style="margin-left:.5in"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:.5in">1) I'd like to try to treat epilogue loops the same way regardless of whether the main loop was vectorized by hand or automatically. So if someone hand-wrote an avx-512 16-wide loop, with alias checks, and we decide
 it's profitable to vectorize the epilogue loop by 4 and re-use the checks, it ought to be done the same way. I realize this may be a pipe-dream, though.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">Ideally it should be like this, but introduction of alias checks comes with its own challenges.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:.5in">2) I'm still somewhat worried about "tiny loops". As I wrote before, we explicitly refuse to vectorize loops we know have a trip-count less than 16, because our profitability heuristic for such loops is probably
 bad. IIUC the only reason we don't bail due to the threshold is because we use the same loop for "failed min iters check" and "failed alias check". So, because it's reachable through the alias-check path, the max trip count isn't actually known, even though
 the typical trip count is probably small. <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:.5in">It's true that you currently don't try to vectorize the epilogue if the original VF is below 16, but this is a somewhat different condition. <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Prerequisite for epilog vectorization is the original loop should get vectorize, for tiny loops if vectorizer refuse to vectorize then epilog version will not
 be generated.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Once we will have the proper costing for checks (i.e. alias, min-itr) then we can make more accurate decision to vectorize epilog loop by considering checks cost.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:.5in">3) Technically speaking, constructing a new InnerLoopVectorizer to vectorize this one loop sounds weird. We already have a worklist in the vectorizer that's currently running.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Adding epilog loop to the loop list comes with following challenges:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoListParagraph" style="text-indent:-.25in;mso-list:l0 level1 lfo1"><![if !supportLists]><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><span style="mso-list:Ignore">a)<span style="font:7.0pt "Times New Roman"">     
</span></span></span><![endif]><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">If we like to add the epilog loop to the list then I’m not sure how we will handle the “alias check result”.<o:p></o:p></span></p>
<p class="MsoListParagraph"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">For epilog vector loop after minimum iteration check it should check the result of already computed “alias check result” if the result asserts ‘alias’
 then execute scalar epilog loop else execute epilog vector loop. As the execution of the epilog vector loop is dependent of already computed “alias check result”, not sure how we will check this fact by adding loop to the list.<o:p></o:p></span></p>
<p class="MsoListParagraph"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Probably loop versioning based on the “alias check result” condition followed by adding the no-alias version to the list can help here. i.e.<o:p></o:p></span></p>
<p class="MsoListParagraph"><img width="533" height="241" id="Picture_x0020_1" src="cid:image001.jpg@01D29D8C.F47BC4E0"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p></o:p></span></p>
<p class="MsoListParagraph"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">If “Scalar LoopVersion1” asserts no alias property then add it to the loop list.<o:p></o:p></span></p>
<p class="MsoListParagraph"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">This versioning design looks weird, it’s just used to show the “alias check result” fact.<o:p></o:p></span></p>
<p class="MsoListParagraph"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoListParagraph"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Any other thoughts handling “alias check result” without versioning ?<o:p></o:p></span></p>
<p class="MsoListParagraph"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoListParagraph" style="text-indent:-.25in;mso-list:l0 level1 lfo1"><![if !supportLists]><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><span style="mso-list:Ignore">b)<span style="font:7.0pt "Times New Roman"">     
</span></span></span><![endif]><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Loop Vectorizer anyway creates instance of “InnerLoopVectorizer” for vectorizing each loop, the only difference is we are creating instance of “InnerLoopVectorizer”
 after generating first vector version within the processing of same loop to cater epilog loop vectorization. The intent is when vectorizer is processing a loop from the list it should process it completely by generating both original and epilog vector version.
<o:p></o:p></span></p>
<p class="MsoListParagraph"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoListParagraph" style="text-indent:-.25in;mso-list:l0 level1 lfo1"><![if !supportLists]><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><span style="mso-list:Ignore">c)<span style="font:7.0pt "Times New Roman"">      
</span></span></span><![endif]><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">In the proposed patch, instead of creating a new instance of “InnerLoopVectorizer”, we can clear the state of existing “InnerLoopVectorizer” object
 and use it for epilog loop vectorization.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">            <o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:.5in">I don't think (1) is a blocker, and (3) should be easy to fix, but I'm not sure whether the way this is going to handle (2) is sufficient.  If I'm the only one that this bothers, I won't stand in the way, but I'd
 like to at least make sure we've fully considered this.<span style="color:#1F497D"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Regards,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Ashutosh<o:p></o:p></span></p>
</div>
</div>
</div>
</div>
</body>
</html>