<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 1/31/19 5:41 PM, Saito, Hideki
      wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:899F03F2C73A55449C51631866B887498439DD49@FMSMSX109.amr.corp.intel.com">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      <meta name="Generator" content="Microsoft Word 15 (filtered
        medium)">
      <style><!--
/* Font Definitions */
@font-face
        {font-family:"MS Mincho";
        panose-1:2 2 6 9 4 2 5 8 3 4;}
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:"\@MS Mincho";
        panose-1:2 2 6 9 4 2 5 8 3 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman",serif;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
span.EmailStyle17
        {mso-style-type:personal-reply;
        font-family:"Calibri",sans-serif;
        color:#1F497D;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-family:"Calibri",sans-serif;}
@page WordSection1
        {size:8.5in 11.0in;
        margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
      <div class="WordSection1">
        <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">I
            think you and I are talking two different things.<o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">As
            far as Intel’s vector function ABI is concerned, unless the
            programmer specifically says otherwise, given an OpenMP
            declare simd function, compiler will<o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">deduce
            the VF from HW vector register size and other function
            signatures. Of course, there can be different vector
            function ABIs for different targets. Intel<o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">compiler
            cost model uses vector function VF as part of loop
            vectorization VF determination. So, it’s tightly coupled.<o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">A
            hypothetical vector target may vectorize such a vector
            function for 4096b vector, with an explicit VF parameter 20
            also passed to it, to execute only the lower<o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">20-elements
            parts of the whole thing.<o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">I
            think this scenario answers Philip’s question on why
            separate mask and VF parameters and why VF can’t be
            conservatively deduced from the mask/mask compute.</span></p>
      </div>
    </blockquote>
    <p>I think this does come close, yes.  There's still the question of
      just how common a short vectorized function of this form is in
      practice after inlining, but I can understand why being able to
      represent this cleanly/concisely would be useful.  My scheme would
      require the mask->length computation code be inserted as
      essentially part of the prolog, and doing so might be reasonable
      expensive.  <br>
    </p>
    <p>On the other hand, if the vector length is already part of the
      ABI - which is sounds like this case is - inserting a bit of dummy
      code which enforces the predicate mask only has bits set below
      VLen could be done w/a simple shift/dec/and sequence.  While the
      sequence itself would be dynamically useless, it would make it
      obvious what the vlen for the function was if it hadn't been
      expressed in the IR.  <br>
    </p>
    <p>Or alternatively, we could use the calling convention ABI detail
      to *assume* (and thus insert during SelectionDAG), the fact that
      the VLEN parameter's relation to the vector mask one.  <br>
    </p>
    <p>My point in the above is not that this is obviously the right
      answer - it's not - simply that it probably could be made to
      work.  As such, I don't think we should be automatically assuming
      we have to match the IR definition precisely to the hardware. 
      Doing so is a recipe for over-fitting and a hard to maintain long
      term design.  <br>
    </p>
    <p>It's worth pointing out that including the vlen parameter in the
      intrinsic definitions creates exactly the opposite problem on a
      SIMD platform.  (i.e. we have to mask out the predicated based on
      the length when generating code.)</p>
    <p>Philip</p>
    <p>p.s. Reminder, just playing devil's advocate.  No strong opinions
      actually held.  :)<br>
    </p>
    <br>
    <blockquote type="cite"
cite="mid:899F03F2C73A55449C51631866B887498439DD49@FMSMSX109.amr.corp.intel.com">
      <div class="WordSection1">
        <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
        <p class="MsoNormal"><a name="_MailEndCompose"
            moz-do-not-send="true"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></a></p>
        <p class="MsoNormal"><a name="_____replyseparator"
            moz-do-not-send="true"></a><b><span
              style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif">
            Bruce Hoult [<a class="moz-txt-link-freetext" href="mailto:bruce@hoult.org">mailto:bruce@hoult.org</a>]
            <br>
            <b>Sent:</b> Thursday, January 31, 2019 5:13 PM<br>
            <b>To:</b> Saito, Hideki <a class="moz-txt-link-rfc2396E" href="mailto:hideki.saito@intel.com"><hideki.saito@intel.com></a><br>
            <b>Cc:</b> Philip Reames <a class="moz-txt-link-rfc2396E" href="mailto:listmail@philipreames.com"><listmail@philipreames.com></a>;
            Robin Kruppe <a class="moz-txt-link-rfc2396E" href="mailto:robin.kruppe@gmail.com"><robin.kruppe@gmail.com></a>; David Greene
            <a class="moz-txt-link-rfc2396E" href="mailto:dag@cray.com"><dag@cray.com></a>; via llvm-dev
            <a class="moz-txt-link-rfc2396E" href="mailto:llvm-dev@lists.llvm.org"><llvm-dev@lists.llvm.org></a>; Maslov, Sergey V
            <a class="moz-txt-link-rfc2396E" href="mailto:sergey.v.maslov@intel.com"><sergey.v.maslov@intel.com></a>; Topper, Craig
            <a class="moz-txt-link-rfc2396E" href="mailto:craig.topper@intel.com"><craig.topper@intel.com></a><br>
            <b>Subject:</b> Re: [llvm-dev] [RFC] Vector Predication<o:p></o:p></span></p>
        <p class="MsoNormal"><o:p> </o:p></p>
        <div>
          <div>
            <div>
              <p class="MsoNormal"><span
                  style="font-family:"Arial",sans-serif">On
                  Thu, Jan 31, 2019 at 4:31 PM Saito, Hideki via
                  llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org"
                    moz-do-not-send="true">llvm-dev@lists.llvm.org</a>>
                  wrote:</span><o:p></o:p></p>
            </div>
          </div>
          <div>
            <blockquote style="border:none;border-left:solid #CCCCCC
              1.0pt;padding:0in 0in 0in
              6.0pt;margin-left:4.8pt;margin-right:0in">
              <div>
                <div>
                  <p class="MsoNormal"
                    style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"> </span><o:p></o:p></p>
                  <p class="MsoNormal"
                    style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">>when
                    we have a mask loaded from an external source
                    (memory, function call boundary, etc...) and a short
                    sequence of vector ops<o:p></o:p></p>
                  <p class="MsoNormal"
                    style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"> </span><o:p></o:p></p>
                  <p class="MsoNormal"
                    style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Mask
                      value from function call parameter is common.
                      OpenMP declare simd function does exactly that for
                      the masked cases.</span><o:p></o:p></p>
                </div>
              </div>
            </blockquote>
            <div>
              <p class="MsoNormal"><o:p> </o:p></p>
            </div>
            <div>
              <p class="MsoNormal">Such a mask is at the application
                level, not at the vector strip-mining loop level.<o:p></o:p></p>
            </div>
            <div>
              <p class="MsoNormal"><o:p> </o:p></p>
            </div>
            <div>
              <p class="MsoNormal">As well as possibly being many times
                longer than the masks the hardware works with, it's
                likely to not even in the the format the hardware uses:
                different library APIs might pack a mask into bits, or
                one mask element per byte, short, or int.<o:p></o:p></p>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
  </body>
</html>