<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:Consolas;
        panose-1:2 11 6 9 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman",serif;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
p
        {mso-style-priority:99;
        mso-margin-top-alt:auto;
        margin-right:0in;
        mso-margin-bottom-alt:auto;
        margin-left:0in;
        font-size:12.0pt;
        font-family:"Times New Roman",serif;}
pre
        {mso-style-priority:99;
        mso-style-link:"HTML Preformatted Char";
        margin:0in;
        margin-bottom:.0001pt;
        font-size:10.0pt;
        font-family:"Courier New";}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
        {mso-style-priority:34;
        margin-top:0in;
        margin-right:0in;
        margin-bottom:0in;
        margin-left:.5in;
        margin-bottom:.0001pt;
        font-size:11.0pt;
        font-family:"Calibri",sans-serif;}
span.HTMLPreformattedChar
        {mso-style-name:"HTML Preformatted Char";
        mso-style-priority:99;
        mso-style-link:"HTML Preformatted";
        font-family:Consolas;}
span.EmailStyle20
        {mso-style-type:personal;
        font-family:"Calibri",sans-serif;
        color:#1F497D;}
span.EmailStyle21
        {mso-style-type:personal;
        font-family:"Calibri",sans-serif;
        color:#1F497D;}
span.EmailStyle22
        {mso-style-type:personal;
        font-family:"Calibri",sans-serif;
        color:#1F497D;}
span.EmailStyle24
        {mso-style-type:personal-reply;
        font-family:"Calibri",sans-serif;
        color:#1F497D;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-size:10.0pt;}
@page WordSection1
        {size:8.5in 11.0in;
        margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
        {page:WordSection1;}
/* List Definitions */
@list l0
        {mso-list-id:395780924;
        mso-list-type:hybrid;
        mso-list-template-ids:1076887890 67698705 67698713 67698715 67698703 67698713 67698715 67698703 67698713 67698715;}
@list l0:level1
        {mso-level-text:"%1\)";
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l0:level2
        {mso-level-number-format:alpha-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l0:level3
        {mso-level-number-format:roman-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:right;
        text-indent:-9.0pt;}
@list l0:level4
        {mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l0:level5
        {mso-level-number-format:alpha-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l0:level6
        {mso-level-number-format:roman-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:right;
        text-indent:-9.0pt;}
@list l0:level7
        {mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l0:level8
        {mso-level-number-format:alpha-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l0:level9
        {mso-level-number-format:roman-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:right;
        text-indent:-9.0pt;}
@list l1
        {mso-list-id:711031633;
        mso-list-type:hybrid;
        mso-list-template-ids:-1063088596 67698705 67698713 67698715 67698703 67698713 67698715 67698703 67698713 67698715;}
@list l1:level1
        {mso-level-text:"%1\)";
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l1:level2
        {mso-level-number-format:alpha-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l1:level3
        {mso-level-number-format:roman-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:right;
        text-indent:-9.0pt;}
@list l1:level4
        {mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l1:level5
        {mso-level-number-format:alpha-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l1:level6
        {mso-level-number-format:roman-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:right;
        text-indent:-9.0pt;}
@list l1:level7
        {mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l1:level8
        {mso-level-number-format:alpha-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l1:level9
        {mso-level-number-format:roman-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:right;
        text-indent:-9.0pt;}
ol
        {margin-bottom:0in;}
ul
        {margin-bottom:0in;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal" style="margin-left:.5in"><b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> Zaks, Ayal [mailto:ayal.zaks@intel.com]
<br>
<b>Sent:</b> Wednesday, March 15, 2017 4:39 AM<br>
<b>To:</b> Nema, Ashutosh <Ashutosh.Nema@amd.com>; anemet@apple.com; Hal Finkel <hfinkel@anl.gov>; Renato Golin <renato.golin@linaro.org>; mkuper@google.com; Mehdi Amini <mehdi.amini@apple.com>; Daniel Berlin <dberlin@dberlin.org><br>
<b>Cc:</b> llvm-dev <llvm-dev@lists.llvm.org><br>
<b>Subject:</b> RE: [llvm-dev] [Proposal][RFC] Epilog loop vectorization<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal" style="margin-left:.5in"><o:p> </o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:0in;margin-right:0in;margin-bottom:12.0pt;margin-left:1.0in">
<a name="_____replyseparator"></a><b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> Nema, Ashutosh [<a href="mailto:Ashutosh.Nema@amd.com">mailto:Ashutosh.Nema@amd.com</a>]
</span><o:p></o:p></p>
<p class="MsoNormal" style="margin-left:1.0in">Summarizing the discussion on the implementation approaches.<o:p></o:p></p>
<p class="MsoNormal" style="margin-left:1.0in"><o:p> </o:p></p>
<p class="MsoNormal" style="margin-left:1.0in">Discussed about two approaches, first running ‘InnerLoopVectorizer’ again on the epilog loop immediately after vectorizing the original loop within the same vectorization pass, the second approach where re-running
 vectorization pass and limiting vectorization factor of epilog loop by metadata.<o:p></o:p></p>
<p class="MsoNormal" style="margin-left:1.0in"><o:p> </o:p></p>
<p class="MsoNormal" style="margin-left:1.0in"><Approach-2><o:p></o:p></p>
<p class="MsoNormal" style="margin-left:1.0in">Challenges with re-running the vectorizer pass:<o:p></o:p></p>
<p class="MsoListParagraph" style="margin-left:1.5in;text-indent:-.25in;mso-list:l0 level1 lfo2">
<![if !supportLists]><span style="mso-list:Ignore">1)<span style="font:7.0pt "Times New Roman"">     
</span></span><![endif]>Reusing alias check result: <o:p></o:p></p>
<p class="MsoListParagraph" style="margin-left:1.5in">When vectorizer pass runs again it finds the epilog loop as a new loop and it may generates alias check, this new alias check may overkill the gains of epilog vectorization.<o:p></o:p></p>
<p class="MsoListParagraph" style="margin-left:1.5in">We should use the already computed alias check result instead of re computing again.<o:p></o:p></p>
<p class="MsoListParagraph"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoListParagraph"><span style="color:#1F497D">Right, can this challenge be addressed – can we record the “simple” fact that the epilog loop is vectorizable with trip count at-most VF*UF when reached from the vectorized loop? This is akin to passing
 similar information from the front-end when supplied by, e.g., OpenMP pragmas, with the additional path-sensitive context attached.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal">I did not get this point completely. Yes, we can record the maximum width for epilog vectorization but what you meant by “path-sensitive context attached”.
<o:p></o:p></p>
<p class="MsoNormal">Please elaborate more on this and how does it help in reusing alias check result ?<o:p></o:p></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoListParagraph"><span style="color:#1F497D">Agreed, if each loop is handled independently, the multiple minimum-trip-count tests should be revisited to optimize for smallest trip-count first.<o:p></o:p></span></p>
<p class="MsoListParagraph"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoListParagraph"><span style="color:#1F497D">If the main loop was vectorized by VF and unrolled by UF>1, it may be reasonable to vectorize the remainder loop with the same VF (w/o unrolling).
<o:p></o:p></span></p>
<p class="MsoListParagraph"><span style="color:#1F497D">And then possibly vectorize the remainder of that with a smaller, say, VF/2. In addition, situations having small types and large vectors may result in large VF, again leaving room for possibly repeated
 epilog vectorizations with decreasing VF’s. At some point it would be good to try the alternative of a (final) masked vector epilog.<o:p></o:p></span></p>
<p class="MsoListParagraph" style="margin-left:0in"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoListParagraph" style="margin-left:0in"><span style="font-size:12.0pt;font-family:"Times New Roman",serif">Each vector version incurs extra cost by adding extra checks, considering this fact I have limit the patch to only generate one epilog vector
 version.<o:p></o:p></span></p>
<p class="MsoListParagraph" style="margin-left:0in"><span style="font-size:12.0pt;font-family:"Times New Roman",serif">We can generate multiple epilog versions but we have to understand the tradeoff of generating them. Once we have the proper costing of checks
 we can make more precise decisions. I like to defer this for later enhancements.
<o:p></o:p></span></p>
<p class="MsoListParagraph" style="margin-left:0in"><span style="font-size:12.0pt;font-family:"Times New Roman",serif"><o:p> </o:p></span></p>
<p class="MsoListParagraph" style="margin-left:0in"><span style="font-size:12.0pt;font-family:"Times New Roman",serif">Masked instructions are available is AVX512 and of course it’s better solution then this. But architectures which does not have masked instruction
 support epilog vector version is one of the technique to vectorize epilog iterations.<o:p></o:p></span></p>
<p class="MsoListParagraph" style="margin-left:0in"><span style="font-size:12.0pt;font-family:"Times New Roman",serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoListParagraph"><span style="color:#1F497D">Ayal.<o:p></o:p></span></p>
<p class="MsoListParagraph"><span style="color:#1F497D"><o:p> </o:p></span></p>
</div>
</body>
</html>