<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 03/15/2017 05:55 AM, Nema, Ashutosh
      wrote:<br>
    </div>
    <blockquote
cite="mid:CY4PR12MB1799BE010E973ABF11C7E8BAFB270@CY4PR12MB1799.namprd12.prod.outlook.com"
      type="cite">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <meta name="Generator" content="Microsoft Word 15 (filtered
        medium)">
      <style><!--
/* Font Definitions */
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:Consolas;
        panose-1:2 11 6 9 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman",serif;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
p
        {mso-style-priority:99;
        mso-margin-top-alt:auto;
        margin-right:0in;
        mso-margin-bottom-alt:auto;
        margin-left:0in;
        font-size:12.0pt;
        font-family:"Times New Roman",serif;}
pre
        {mso-style-priority:99;
        mso-style-link:"HTML Preformatted Char";
        margin:0in;
        margin-bottom:.0001pt;
        font-size:10.0pt;
        font-family:"Courier New";}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
        {mso-style-priority:34;
        margin-top:0in;
        margin-right:0in;
        margin-bottom:0in;
        margin-left:.5in;
        margin-bottom:.0001pt;
        font-size:11.0pt;
        font-family:"Calibri",sans-serif;}
span.HTMLPreformattedChar
        {mso-style-name:"HTML Preformatted Char";
        mso-style-priority:99;
        mso-style-link:"HTML Preformatted";
        font-family:Consolas;}
span.EmailStyle20
        {mso-style-type:personal;
        font-family:"Calibri",sans-serif;
        color:#1F497D;}
span.EmailStyle21
        {mso-style-type:personal;
        font-family:"Calibri",sans-serif;
        color:#1F497D;}
span.EmailStyle22
        {mso-style-type:personal;
        font-family:"Calibri",sans-serif;
        color:#1F497D;}
span.EmailStyle24
        {mso-style-type:personal-reply;
        font-family:"Calibri",sans-serif;
        color:#1F497D;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-size:10.0pt;}
@page WordSection1
        {size:8.5in 11.0in;
        margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
        {page:WordSection1;}
/* List Definitions */
@list l0
        {mso-list-id:395780924;
        mso-list-type:hybrid;
        mso-list-template-ids:1076887890 67698705 67698713 67698715 67698703 67698713 67698715 67698703 67698713 67698715;}
@list l0:level1
        {mso-level-text:"%1\)";
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l0:level2
        {mso-level-number-format:alpha-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l0:level3
        {mso-level-number-format:roman-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:right;
        text-indent:-9.0pt;}
@list l0:level4
        {mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l0:level5
        {mso-level-number-format:alpha-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l0:level6
        {mso-level-number-format:roman-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:right;
        text-indent:-9.0pt;}
@list l0:level7
        {mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l0:level8
        {mso-level-number-format:alpha-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l0:level9
        {mso-level-number-format:roman-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:right;
        text-indent:-9.0pt;}
@list l1
        {mso-list-id:711031633;
        mso-list-type:hybrid;
        mso-list-template-ids:-1063088596 67698705 67698713 67698715 67698703 67698713 67698715 67698703 67698713 67698715;}
@list l1:level1
        {mso-level-text:"%1\)";
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l1:level2
        {mso-level-number-format:alpha-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l1:level3
        {mso-level-number-format:roman-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:right;
        text-indent:-9.0pt;}
@list l1:level4
        {mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l1:level5
        {mso-level-number-format:alpha-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l1:level6
        {mso-level-number-format:roman-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:right;
        text-indent:-9.0pt;}
@list l1:level7
        {mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l1:level8
        {mso-level-number-format:alpha-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l1:level9
        {mso-level-number-format:roman-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:right;
        text-indent:-9.0pt;}
ol
        {margin-bottom:0in;}
ul
        {margin-bottom:0in;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
      <div class="WordSection1">
        <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
        <div>
          <div style="border:none;border-top:solid #E1E1E1
            1.0pt;padding:3.0pt 0in 0in 0in">
            <p class="MsoNormal" style="margin-left:.5in"><b><span
                  style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif">
                Zaks, Ayal [<a class="moz-txt-link-freetext" href="mailto:ayal.zaks@intel.com">mailto:ayal.zaks@intel.com</a>]
                <br>
                <b>Sent:</b> Wednesday, March 15, 2017 4:39 AM<br>
                <b>To:</b> Nema, Ashutosh <a class="moz-txt-link-rfc2396E" href="mailto:Ashutosh.Nema@amd.com"><Ashutosh.Nema@amd.com></a>;
                <a class="moz-txt-link-abbreviated" href="mailto:anemet@apple.com">anemet@apple.com</a>; Hal Finkel <a class="moz-txt-link-rfc2396E" href="mailto:hfinkel@anl.gov"><hfinkel@anl.gov></a>;
                Renato Golin <a class="moz-txt-link-rfc2396E" href="mailto:renato.golin@linaro.org"><renato.golin@linaro.org></a>;
                <a class="moz-txt-link-abbreviated" href="mailto:mkuper@google.com">mkuper@google.com</a>; Mehdi Amini
                <a class="moz-txt-link-rfc2396E" href="mailto:mehdi.amini@apple.com"><mehdi.amini@apple.com></a>; Daniel Berlin
                <a class="moz-txt-link-rfc2396E" href="mailto:dberlin@dberlin.org"><dberlin@dberlin.org></a><br>
                <b>Cc:</b> llvm-dev <a class="moz-txt-link-rfc2396E" href="mailto:llvm-dev@lists.llvm.org"><llvm-dev@lists.llvm.org></a><br>
                <b>Subject:</b> RE: [llvm-dev] [Proposal][RFC] Epilog
                loop vectorization<o:p></o:p></span></p>
          </div>
        </div>
        <p class="MsoNormal" style="margin-left:.5in"><o:p> </o:p></p>
        <p class="MsoNormal"
style="mso-margin-top-alt:0in;margin-right:0in;margin-bottom:12.0pt;margin-left:1.0in"><a
            moz-do-not-send="true" name="_____replyseparator"></a><b><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif">
            Nema, Ashutosh [<a moz-do-not-send="true"
              href="mailto:Ashutosh.Nema@amd.com">mailto:Ashutosh.Nema@amd.com</a>]
          </span><o:p></o:p></p>
        <p class="MsoNormal" style="margin-left:1.0in">Summarizing the
          discussion on the implementation approaches.<o:p></o:p></p>
        <p class="MsoNormal" style="margin-left:1.0in"><o:p> </o:p></p>
        <p class="MsoNormal" style="margin-left:1.0in">Discussed about
          two approaches, first running ‘InnerLoopVectorizer’ again on
          the epilog loop immediately after vectorizing the original
          loop within the same vectorization pass, the second approach
          where re-running vectorization pass and limiting vectorization
          factor of epilog loop by metadata.<o:p></o:p></p>
        <p class="MsoNormal" style="margin-left:1.0in"><o:p> </o:p></p>
        <p class="MsoNormal" style="margin-left:1.0in"><Approach-2><o:p></o:p></p>
        <p class="MsoNormal" style="margin-left:1.0in">Challenges with
          re-running the vectorizer pass:<o:p></o:p></p>
        <p class="MsoListParagraph"
          style="margin-left:1.5in;text-indent:-.25in;mso-list:l0 level1
          lfo2">
          <!--[if !supportLists]--><span style="mso-list:Ignore">1)<span
              style="font:7.0pt "Times New Roman"">     
            </span></span><!--[endif]-->Reusing alias check result: <o:p></o:p></p>
        <p class="MsoListParagraph" style="margin-left:1.5in">When
          vectorizer pass runs again it finds the epilog loop as a new
          loop and it may generates alias check, this new alias check
          may overkill the gains of epilog vectorization.<o:p></o:p></p>
        <p class="MsoListParagraph" style="margin-left:1.5in">We should
          use the already computed alias check result instead of re
          computing again.<o:p></o:p></p>
        <p class="MsoListParagraph"><span style="color:#1F497D"><o:p> </o:p></span></p>
        <p class="MsoListParagraph"><span style="color:#1F497D">Right,
            can this challenge be addressed – can we record the “simple”
            fact that the epilog loop is vectorizable with trip count
            at-most VF*UF when reached from the vectorized loop? This is
            akin to passing similar information from the front-end when
            supplied by, e.g., OpenMP pragmas, with the additional
            path-sensitive context attached.<o:p></o:p></span></p>
        <p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
        <p class="MsoNormal">I did not get this point completely. Yes,
          we can record the maximum width for epilog vectorization but
          what you meant by “path-sensitive context attached”.
          <o:p></o:p></p>
        <p class="MsoNormal">Please elaborate more on this and how does
          it help in reusing alias check result ?<o:p></o:p></p>
        <p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
        <p class="MsoListParagraph"><span style="color:#1F497D">Agreed,
            if each loop is handled independently, the multiple
            minimum-trip-count tests should be revisited to optimize for
            smallest trip-count first.<o:p></o:p></span></p>
        <p class="MsoListParagraph"><span style="color:#1F497D"><o:p> </o:p></span></p>
        <p class="MsoListParagraph"><span style="color:#1F497D">If the
            main loop was vectorized by VF and unrolled by UF>1, it
            may be reasonable to vectorize the remainder loop with the
            same VF (w/o unrolling).
            <o:p></o:p></span></p>
        <p class="MsoListParagraph"><span style="color:#1F497D">And then
            possibly vectorize the remainder of that with a smaller,
            say, VF/2. In addition, situations having small types and
            large vectors may result in large VF, again leaving room for
            possibly repeated epilog vectorizations with decreasing
            VF’s. At some point it would be good to try the alternative
            of a (final) masked vector epilog.<o:p></o:p></span></p>
        <p class="MsoListParagraph" style="margin-left:0in"><span
            style="color:#1F497D"><o:p> </o:p></span></p>
        <p class="MsoListParagraph" style="margin-left:0in"><span
            style="font-size:12.0pt;font-family:"Times New
            Roman",serif">Each vector version incurs extra cost by
            adding extra checks, considering this fact I have limit the
            patch to only generate one epilog vector version.<o:p></o:p></span></p>
        <p class="MsoListParagraph" style="margin-left:0in"><span
            style="font-size:12.0pt;font-family:"Times New
            Roman",serif">We can generate multiple epilog versions
            but we have to understand the tradeoff of generating them.
            Once we have the proper costing of checks we can make more
            precise decisions. I like to defer this for later
            enhancements.
            '</span></p>
      </div>
    </blockquote>
    <br>
    If we model the costs of the extra checks and branches, then we can
    ask: Will the savings from executing even one iteration of the
    vectorized epilogue loop be greater than the cost of the checks. For
    really small loops, this might not be obvious?<br>
    <br>
     -Hal<br>
    <br>
    <blockquote
cite="mid:CY4PR12MB1799BE010E973ABF11C7E8BAFB270@CY4PR12MB1799.namprd12.prod.outlook.com"
      type="cite">
      <div class="WordSection1">
        <p class="MsoListParagraph" style="margin-left:0in"><span
            style="font-size:12.0pt;font-family:"Times New
            Roman",serif"><o:p></o:p></span></p>
        <p class="MsoListParagraph" style="margin-left:0in"><span
            style="font-size:12.0pt;font-family:"Times New
            Roman",serif"><o:p> </o:p></span></p>
        <p class="MsoListParagraph" style="margin-left:0in"><span
            style="font-size:12.0pt;font-family:"Times New
            Roman",serif">Masked instructions are available is
            AVX512 and of course it’s better solution then this. But
            architectures which does not have masked instruction support
            epilog vector version is one of the technique to vectorize
            epilog iterations.<o:p></o:p></span></p>
        <p class="MsoListParagraph" style="margin-left:0in"><span
            style="font-size:12.0pt;font-family:"Times New
            Roman",serif;color:#1F497D"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
        <p class="MsoListParagraph"><span style="color:#1F497D">Ayal.<o:p></o:p></span></p>
        <p class="MsoListParagraph"><span style="color:#1F497D"><o:p> </o:p></span></p>
      </div>
    </blockquote>
    <br>
    <pre class="moz-signature" cols="72">-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
  </body>
</html>