<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<p><br>
</p>
<div class="moz-cite-prefix">On 03/15/2017 05:55 AM, Nema, Ashutosh
wrote:<br>
</div>
<blockquote
cite="mid:CY4PR12MB1799BE010E973ABF11C7E8BAFB270@CY4PR12MB1799.namprd12.prod.outlook.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered
medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Consolas;
panose-1:2 11 6 9 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman",serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p
{mso-style-priority:99;
mso-margin-top-alt:auto;
margin-right:0in;
mso-margin-bottom-alt:auto;
margin-left:0in;
font-size:12.0pt;
font-family:"Times New Roman",serif;}
pre
{mso-style-priority:99;
mso-style-link:"HTML Preformatted Char";
margin:0in;
margin-bottom:.0001pt;
font-size:10.0pt;
font-family:"Courier New";}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
{mso-style-priority:34;
margin-top:0in;
margin-right:0in;
margin-bottom:0in;
margin-left:.5in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
span.HTMLPreformattedChar
{mso-style-name:"HTML Preformatted Char";
mso-style-priority:99;
mso-style-link:"HTML Preformatted";
font-family:Consolas;}
span.EmailStyle20
{mso-style-type:personal;
font-family:"Calibri",sans-serif;
color:#1F497D;}
span.EmailStyle21
{mso-style-type:personal;
font-family:"Calibri",sans-serif;
color:#1F497D;}
span.EmailStyle22
{mso-style-type:personal;
font-family:"Calibri",sans-serif;
color:#1F497D;}
span.EmailStyle24
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
/* List Definitions */
@list l0
{mso-list-id:395780924;
mso-list-type:hybrid;
mso-list-template-ids:1076887890 67698705 67698713 67698715 67698703 67698713 67698715 67698703 67698713 67698715;}
@list l0:level1
{mso-level-text:"%1\)";
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level2
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level3
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l0:level4
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level5
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level6
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l0:level7
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level8
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level9
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l1
{mso-list-id:711031633;
mso-list-type:hybrid;
mso-list-template-ids:-1063088596 67698705 67698713 67698715 67698703 67698713 67698715 67698703 67698713 67698715;}
@list l1:level1
{mso-level-text:"%1\)";
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l1:level2
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l1:level3
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l1:level4
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l1:level5
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l1:level6
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l1:level7
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l1:level8
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l1:level9
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
ol
{margin-bottom:0in;}
ul
{margin-bottom:0in;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
<div class="WordSection1">
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<div>
<div style="border:none;border-top:solid #E1E1E1
1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal" style="margin-left:.5in"><b><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif">
Zaks, Ayal [<a class="moz-txt-link-freetext" href="mailto:ayal.zaks@intel.com">mailto:ayal.zaks@intel.com</a>]
<br>
<b>Sent:</b> Wednesday, March 15, 2017 4:39 AM<br>
<b>To:</b> Nema, Ashutosh <a class="moz-txt-link-rfc2396E" href="mailto:Ashutosh.Nema@amd.com"><Ashutosh.Nema@amd.com></a>;
<a class="moz-txt-link-abbreviated" href="mailto:anemet@apple.com">anemet@apple.com</a>; Hal Finkel <a class="moz-txt-link-rfc2396E" href="mailto:hfinkel@anl.gov"><hfinkel@anl.gov></a>;
Renato Golin <a class="moz-txt-link-rfc2396E" href="mailto:renato.golin@linaro.org"><renato.golin@linaro.org></a>;
<a class="moz-txt-link-abbreviated" href="mailto:mkuper@google.com">mkuper@google.com</a>; Mehdi Amini
<a class="moz-txt-link-rfc2396E" href="mailto:mehdi.amini@apple.com"><mehdi.amini@apple.com></a>; Daniel Berlin
<a class="moz-txt-link-rfc2396E" href="mailto:dberlin@dberlin.org"><dberlin@dberlin.org></a><br>
<b>Cc:</b> llvm-dev <a class="moz-txt-link-rfc2396E" href="mailto:llvm-dev@lists.llvm.org"><llvm-dev@lists.llvm.org></a><br>
<b>Subject:</b> RE: [llvm-dev] [Proposal][RFC] Epilog
loop vectorization<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal" style="margin-left:.5in"><o:p> </o:p></p>
<p class="MsoNormal"
style="mso-margin-top-alt:0in;margin-right:0in;margin-bottom:12.0pt;margin-left:1.0in"><a
moz-do-not-send="true" name="_____replyseparator"></a><b><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif">
Nema, Ashutosh [<a moz-do-not-send="true"
href="mailto:Ashutosh.Nema@amd.com">mailto:Ashutosh.Nema@amd.com</a>]
</span><o:p></o:p></p>
<p class="MsoNormal" style="margin-left:1.0in">Summarizing the
discussion on the implementation approaches.<o:p></o:p></p>
<p class="MsoNormal" style="margin-left:1.0in"><o:p> </o:p></p>
<p class="MsoNormal" style="margin-left:1.0in">Discussed about
two approaches, first running ‘InnerLoopVectorizer’ again on
the epilog loop immediately after vectorizing the original
loop within the same vectorization pass, the second approach
where re-running vectorization pass and limiting vectorization
factor of epilog loop by metadata.<o:p></o:p></p>
<p class="MsoNormal" style="margin-left:1.0in"><o:p> </o:p></p>
<p class="MsoNormal" style="margin-left:1.0in"><Approach-2><o:p></o:p></p>
<p class="MsoNormal" style="margin-left:1.0in">Challenges with
re-running the vectorizer pass:<o:p></o:p></p>
<p class="MsoListParagraph"
style="margin-left:1.5in;text-indent:-.25in;mso-list:l0 level1
lfo2">
<!--[if !supportLists]--><span style="mso-list:Ignore">1)<span
style="font:7.0pt "Times New Roman"">
</span></span><!--[endif]-->Reusing alias check result: <o:p></o:p></p>
<p class="MsoListParagraph" style="margin-left:1.5in">When
vectorizer pass runs again it finds the epilog loop as a new
loop and it may generates alias check, this new alias check
may overkill the gains of epilog vectorization.<o:p></o:p></p>
<p class="MsoListParagraph" style="margin-left:1.5in">We should
use the already computed alias check result instead of re
computing again.<o:p></o:p></p>
<p class="MsoListParagraph"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoListParagraph"><span style="color:#1F497D">Right,
can this challenge be addressed – can we record the “simple”
fact that the epilog loop is vectorizable with trip count
at-most VF*UF when reached from the vectorized loop? This is
akin to passing similar information from the front-end when
supplied by, e.g., OpenMP pragmas, with the additional
path-sensitive context attached.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal">I did not get this point completely. Yes,
we can record the maximum width for epilog vectorization but
what you meant by “path-sensitive context attached”.
<o:p></o:p></p>
<p class="MsoNormal">Please elaborate more on this and how does
it help in reusing alias check result ?<o:p></o:p></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoListParagraph"><span style="color:#1F497D">Agreed,
if each loop is handled independently, the multiple
minimum-trip-count tests should be revisited to optimize for
smallest trip-count first.<o:p></o:p></span></p>
<p class="MsoListParagraph"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoListParagraph"><span style="color:#1F497D">If the
main loop was vectorized by VF and unrolled by UF>1, it
may be reasonable to vectorize the remainder loop with the
same VF (w/o unrolling).
<o:p></o:p></span></p>
<p class="MsoListParagraph"><span style="color:#1F497D">And then
possibly vectorize the remainder of that with a smaller,
say, VF/2. In addition, situations having small types and
large vectors may result in large VF, again leaving room for
possibly repeated epilog vectorizations with decreasing
VF’s. At some point it would be good to try the alternative
of a (final) masked vector epilog.<o:p></o:p></span></p>
<p class="MsoListParagraph" style="margin-left:0in"><span
style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoListParagraph" style="margin-left:0in"><span
style="font-size:12.0pt;font-family:"Times New
Roman",serif">Each vector version incurs extra cost by
adding extra checks, considering this fact I have limit the
patch to only generate one epilog vector version.<o:p></o:p></span></p>
<p class="MsoListParagraph" style="margin-left:0in"><span
style="font-size:12.0pt;font-family:"Times New
Roman",serif">We can generate multiple epilog versions
but we have to understand the tradeoff of generating them.
Once we have the proper costing of checks we can make more
precise decisions. I like to defer this for later
enhancements.
'</span></p>
</div>
</blockquote>
<br>
If we model the costs of the extra checks and branches, then we can
ask: Will the savings from executing even one iteration of the
vectorized epilogue loop be greater than the cost of the checks. For
really small loops, this might not be obvious?<br>
<br>
-Hal<br>
<br>
<blockquote
cite="mid:CY4PR12MB1799BE010E973ABF11C7E8BAFB270@CY4PR12MB1799.namprd12.prod.outlook.com"
type="cite">
<div class="WordSection1">
<p class="MsoListParagraph" style="margin-left:0in"><span
style="font-size:12.0pt;font-family:"Times New
Roman",serif"><o:p></o:p></span></p>
<p class="MsoListParagraph" style="margin-left:0in"><span
style="font-size:12.0pt;font-family:"Times New
Roman",serif"><o:p> </o:p></span></p>
<p class="MsoListParagraph" style="margin-left:0in"><span
style="font-size:12.0pt;font-family:"Times New
Roman",serif">Masked instructions are available is
AVX512 and of course it’s better solution then this. But
architectures which does not have masked instruction support
epilog vector version is one of the technique to vectorize
epilog iterations.<o:p></o:p></span></p>
<p class="MsoListParagraph" style="margin-left:0in"><span
style="font-size:12.0pt;font-family:"Times New
Roman",serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoListParagraph"><span style="color:#1F497D">Ayal.<o:p></o:p></span></p>
<p class="MsoListParagraph"><span style="color:#1F497D"><o:p> </o:p></span></p>
</div>
</blockquote>
<br>
<pre class="moz-signature" cols="72">--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
</body>
</html>