<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <p><br>

    </p>

    <div class="moz-cite-prefix">On 07/03/2017 10:41 PM, Sean Silva

      wrote:<br>

    </div>

    <blockquote

cite="mid:CAHnXoanmARac=WH6DH80m8BPbVZvzm3D-8WghX7T-uC2-dUVog@mail.gmail.com"

      type="cite">

      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

      <div dir="ltr"><br>

        <div class="gmail_extra"><br>

          <div class="gmail_quote">On Mon, Jul 3, 2017 at 4:43 PM, Hal

            Finkel via llvm-dev <span dir="ltr"><<a

                moz-do-not-send="true"

                href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span>

            wrote:<br>

            <blockquote class="gmail_quote" style="margin:0px 0px 0px

              0.8ex;border-left:1px solid

              rgb(204,204,204);padding-left:1ex"><span class="gmail-"><br>

                On 06/30/2017 03:02 AM, Chandler Carruth wrote:<br>

                <blockquote class="gmail_quote" style="margin:0px 0px

                  0px 0.8ex;border-left:1px solid

                  rgb(204,204,204);padding-left:1ex">

                  I have hit a fairly isolated practical issue deploying

                  the new PM, but it does point to a latent theoretical

                  issues as well. I see various ways to address it, but

                  want feedback from others before moving forward.<br>

                  <br>

                  The issue is that we can introduce out-of-thin-air

                  calls to known library functions (`SimplifyLibCalls`,

                  etc). These can be introduced in function passes

                  (`InstCombine` in particular) and that seems highly

                  desirable.<br>

                  <br>

                  These all look like one of these cases:<br>

                  1a) Introducing a new call to an LLVM intrinsic<br>

                  1b) Replacing an existing call with a call to an LLVM

                  intrinsic<br>

                  2a) Introducing a new call to a declared library

                  function (but not defined)<br>

                  2b) Replacing an existing call with a call to a

                  declared library function<br>

                  3a) Introducing a new call to a defined library

                  function<br>

                  3b) Replacing an existing call with a call to a

                  defined library function<br>

                  <br>

                  Both #1 and #2 are easy to handle in reality.

                  Intrinsics and declared functions don't impact the

                  PM's call graph because there is no need to order the

                  walk over them. But #3 is a real issue.<br>

                  <br>

                  The only case I have found that actually hits #3 at

                  all hits #3b when building FORTIFY code with the new

                  pass manager because after inlining we do a lot of

                  (really nice) optimizations on library calls to remove

                  unnecessary FORTIFY checks. But this is in *theory* a

                  problem when LTO-ing with libc. More likely it could

                  be a problem when LTO-ing with a vector math library.<br>

                </blockquote>

                <br>

              </span>

              This latter case concerns me most. When the vectorizer

              creates the vectorized version of a loop, that's new code

              (the original code for the loop stays in place as a fall

              back). Further, the vectorizer can (today) create calls to

              vector math library functions, and a setup where we LTO

              with the definitions of those functions is certainly

              possible (and desirable). As a result, this issue does not

              seem all that theoretical to me. </blockquote>

            <blockquote class="gmail_quote" style="margin:0px 0px 0px

              0.8ex;border-left:1px solid

              rgb(204,204,204);padding-left:1ex">

              <br>

              Moreover, once we have support for OpenMP simd functions,

              we'll end up in exactly this situation on a regular basis,

              and we can't have intrinsics for all of the possible

              user-defined functions</blockquote>

            <div><br>

            </div>

            <div>Can you clarify how OpenMP simd would require

              introducing out-of-thin-air references to an open-ended

              set of user-defined functions?</div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    Because OpenMP simd functions (i.e. the "declare simd"

    functionality) allows the user to specify that vectorized versions

    of a given scalar function are to be made available, meaning

    generated, or are externally available. For example, let's say we

    have this:<br>

    <br>

    #pragma omp declare simd notinbranch<br>

    float min (float a, float b) { Return a < b ? a : b; }<br>

    <br>

    void minner (float *a, float *b, float *c) {<br>

      #pragma omp parallel for simd<br>

      for (i=0; i<N; i++)<br>

      c[i] = min(a[i], b[i], c[i]);<br>

    }<br>

    <br>

    And, assume for a moment that min() was not inlined before

    vectorization. The pragma says that we'll generate some vectorized

    version of the min function taking vector arguments (*), and then

    the vectorizer, when generating the vectorized loop body, will

    insert a call to said vectorized min function at the appropriate

    place. There are already patches floating around phabricator to

    implement this (although I'm not sure of their status), and it

    certainly is an important functionality.<br>

    <br>

    (*) Intel, at least, has a well-defined ABI for these functions:

    <a class="moz-txt-link-freetext" href="https://software.intel.com/en-us/articles/vector-simd-function-abi">https://software.intel.com/en-us/articles/vector-simd-function-abi</a><br>

    <br>

    <blockquote

cite="mid:CAHnXoanmARac=WH6DH80m8BPbVZvzm3D-8WghX7T-uC2-dUVog@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div><br>

            </div>

            <div> </div>

            <blockquote class="gmail_quote" style="margin:0px 0px 0px

              0.8ex;border-left:1px solid

              rgb(204,204,204);padding-left:1ex"> (unless the intrinsic

              just takes a function pointer and we clean it up

              afterwards somehow).</blockquote>

            <div><br>

            </div>

            <div>Note that simply referencing a function pointer

              out-of-thin-air would still run afoul of the same issue.

              It would constitute a ref edge and have similar

              implications as a direct call, at least as far as the

              fundamental problem here is concerned (guaranteeing

              bottom-up iteration order). So an intrinsic taking a

              function pointer wouldn't really circumvent the issue (if

              I understand correctly what you're saying).</div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    Good point.<br>

    <br>

    Thanks again,<br>

    Hal<br>

    <br>

    <blockquote

cite="mid:CAHnXoanmARac=WH6DH80m8BPbVZvzm3D-8WghX7T-uC2-dUVog@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div><br>

            </div>

            <div>-- Sean Silva</div>

            <div> </div>

            <blockquote class="gmail_quote" style="margin:0px 0px 0px

              0.8ex;border-left:1px solid

              rgb(204,204,204);padding-left:1ex"> In short, I think we

              do need to correctly handle this situation.<br>

              <br>

              FWIW, I can also see this situation come up in other

              instrumentation cases as well. There are plenty of cases

              where it is useful to LTO with a runtime library.<br>

              <br>

              Thanks again,<br>

              Hal<span class="gmail-im gmail-HOEnZb"><br>

                <br>

                <blockquote class="gmail_quote" style="margin:0px 0px

                  0px 0.8ex;border-left:1px solid

                  rgb(204,204,204);padding-left:1ex">

                  <br>

                  So what do we do?<br>

                  <br>

                  My initial idea: find all *defined* library functions

                  in the module, and every time we create a ref edge to

                  one of them, synthesize a ref edge to all of them.

                  This should completely solve #3b above. But it doesn't

                  really address #3a at all.<br>

                  <br>

                  Is that OK? It would be very convenient to say that if

                  we want to introduce truly novel and new calls to

                  library functions, we should have an LLVM intrinsic to

                  model those routines.<br>

                  <br>

                  But we actually have an example (I think) of #3a,

                  introducing a call to a library function out of the

                  blue: memset_pattern. =/<br>

                  <br>

                  The only way I see to reasonably handle #3a is to have

                  *every* function implicitly contain a reference edge

                  to every defined library function in the module. This

                  is, needless to say, amazingly wasteful. Hence my

                  email. How important is this?<br>

                  <br>

                  If we need to correctly handle this, I think I would

                  probably implement this by actually changing the

                  *iteration* of reference edges in the graph to just

                  implicitly walk the list of defined library functions

                  so that we didn't burn any space on this. But it will

                  make iteration of reference edges slower and add a

                  reasonable amount of complexity. So I'd like to hear

                  some other opinions before going down either of these

                  roads.<br>

                  <br>

                  <br>

                  Thanks,<br>

                  -Chandler<br>

                </blockquote>

                <br>

              </span><span class="gmail-HOEnZb"><font color="#888888">

                  -- <br>

                  Hal Finkel<br>

                  Lead, Compiler Technology and Programming Languages<br>

                  Leadership Computing Facility<br>

                  Argonne National Laboratory</font></span>

              <div class="gmail-HOEnZb">

                <div class="gmail-h5"><br>

                  <br>

                  ______________________________<wbr>_________________<br>

                  LLVM Developers mailing list<br>

                  <a moz-do-not-send="true"

                    href="mailto:llvm-dev@lists.llvm.org"

                    target="_blank">llvm-dev@lists.llvm.org</a><br>

                  <a moz-do-not-send="true"

                    href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"

                    rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>

                </div>

              </div>

            </blockquote>

          </div>

          <br>

        </div>

      </div>

    </blockquote>

    <br>

    <pre class="moz-signature" cols="72">-- 

Hal Finkel

Lead, Compiler Technology and Programming Languages

Leadership Computing Facility

Argonne National Laboratory</pre>

  </body>

</html>