<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p><br>
</p>
<div class="moz-cite-prefix">On 1/31/19 5:41 PM, Saito, Hideki
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:899F03F2C73A55449C51631866B887498439DD49@FMSMSX109.amr.corp.intel.com">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta name="Generator" content="Microsoft Word 15 (filtered
medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"MS Mincho";
panose-1:2 2 6 9 4 2 5 8 3 4;}
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:"\@MS Mincho";
panose-1:2 2 6 9 4 2 5 8 3 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman",serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
span.EmailStyle17
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri",sans-serif;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
<div class="WordSection1">
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">I
think you and I are talking two different things.<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">As
far as Intel’s vector function ABI is concerned, unless the
programmer specifically says otherwise, given an OpenMP
declare simd function, compiler will<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">deduce
the VF from HW vector register size and other function
signatures. Of course, there can be different vector
function ABIs for different targets. Intel<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">compiler
cost model uses vector function VF as part of loop
vectorization VF determination. So, it’s tightly coupled.<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">A
hypothetical vector target may vectorize such a vector
function for 4096b vector, with an explicit VF parameter 20
also passed to it, to execute only the lower<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">20-elements
parts of the whole thing.<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">I
think this scenario answers Philip’s question on why
separate mask and VF parameters and why VF can’t be
conservatively deduced from the mask/mask compute.</span></p>
</div>
</blockquote>
<p>I think this does come close, yes. There's still the question of
just how common a short vectorized function of this form is in
practice after inlining, but I can understand why being able to
represent this cleanly/concisely would be useful. My scheme would
require the mask->length computation code be inserted as
essentially part of the prolog, and doing so might be reasonable
expensive. <br>
</p>
<p>On the other hand, if the vector length is already part of the
ABI - which is sounds like this case is - inserting a bit of dummy
code which enforces the predicate mask only has bits set below
VLen could be done w/a simple shift/dec/and sequence. While the
sequence itself would be dynamically useless, it would make it
obvious what the vlen for the function was if it hadn't been
expressed in the IR. <br>
</p>
<p>Or alternatively, we could use the calling convention ABI detail
to *assume* (and thus insert during SelectionDAG), the fact that
the VLEN parameter's relation to the vector mask one. <br>
</p>
<p>My point in the above is not that this is obviously the right
answer - it's not - simply that it probably could be made to
work. As such, I don't think we should be automatically assuming
we have to match the IR definition precisely to the hardware.
Doing so is a recipe for over-fitting and a hard to maintain long
term design. <br>
</p>
<p>It's worth pointing out that including the vlen parameter in the
intrinsic definitions creates exactly the opposite problem on a
SIMD platform. (i.e. we have to mask out the predicated based on
the length when generating code.)</p>
<p>Philip</p>
<p>p.s. Reminder, just playing devil's advocate. No strong opinions
actually held. :)<br>
</p>
<br>
<blockquote type="cite"
cite="mid:899F03F2C73A55449C51631866B887498439DD49@FMSMSX109.amr.corp.intel.com">
<div class="WordSection1">
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><a name="_MailEndCompose"
moz-do-not-send="true"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></a></p>
<p class="MsoNormal"><a name="_____replyseparator"
moz-do-not-send="true"></a><b><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif">
Bruce Hoult [<a class="moz-txt-link-freetext" href="mailto:bruce@hoult.org">mailto:bruce@hoult.org</a>]
<br>
<b>Sent:</b> Thursday, January 31, 2019 5:13 PM<br>
<b>To:</b> Saito, Hideki <a class="moz-txt-link-rfc2396E" href="mailto:hideki.saito@intel.com"><hideki.saito@intel.com></a><br>
<b>Cc:</b> Philip Reames <a class="moz-txt-link-rfc2396E" href="mailto:listmail@philipreames.com"><listmail@philipreames.com></a>;
Robin Kruppe <a class="moz-txt-link-rfc2396E" href="mailto:robin.kruppe@gmail.com"><robin.kruppe@gmail.com></a>; David Greene
<a class="moz-txt-link-rfc2396E" href="mailto:dag@cray.com"><dag@cray.com></a>; via llvm-dev
<a class="moz-txt-link-rfc2396E" href="mailto:llvm-dev@lists.llvm.org"><llvm-dev@lists.llvm.org></a>; Maslov, Sergey V
<a class="moz-txt-link-rfc2396E" href="mailto:sergey.v.maslov@intel.com"><sergey.v.maslov@intel.com></a>; Topper, Craig
<a class="moz-txt-link-rfc2396E" href="mailto:craig.topper@intel.com"><craig.topper@intel.com></a><br>
<b>Subject:</b> Re: [llvm-dev] [RFC] Vector Predication<o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<div>
<p class="MsoNormal"><span
style="font-family:"Arial",sans-serif">On
Thu, Jan 31, 2019 at 4:31 PM Saito, Hideki via
llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org"
moz-do-not-send="true">llvm-dev@lists.llvm.org</a>>
wrote:</span><o:p></o:p></p>
</div>
</div>
<div>
<blockquote style="border:none;border-left:solid #CCCCCC
1.0pt;padding:0in 0in 0in
6.0pt;margin-left:4.8pt;margin-right:0in">
<div>
<div>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"> </span><o:p></o:p></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">>when
we have a mask loaded from an external source
(memory, function call boundary, etc...) and a short
sequence of vector ops<o:p></o:p></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"> </span><o:p></o:p></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Mask
value from function call parameter is common.
OpenMP declare simd function does exactly that for
the masked cases.</span><o:p></o:p></p>
</div>
</div>
</blockquote>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Such a mask is at the application
level, not at the vector strip-mining loop level.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">As well as possibly being many times
longer than the masks the hardware works with, it's
likely to not even in the the format the hardware uses:
different library APIs might pack a mask into bits, or
one mask element per byte, short, or int.<o:p></o:p></p>
</div>
</div>
</div>
</div>
</blockquote>
</body>
</html>