[llvm-dev] An issue with new PM's requirements on call graph changes

Thu Jul 6 20:11:10 PDT 2017

On 07/03/2017 10:41 PM, Sean Silva wrote:
>
>
> On Mon, Jul 3, 2017 at 4:43 PM, Hal Finkel via llvm-dev 
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>
>
>     On 06/30/2017 03:02 AM, Chandler Carruth wrote:
>
>         I have hit a fairly isolated practical issue deploying the new
>         PM, but it does point to a latent theoretical issues as well.
>         I see various ways to address it, but want feedback from
>         others before moving forward.
>
>         The issue is that we can introduce out-of-thin-air calls to
>         known library functions (`SimplifyLibCalls`, etc). These can
>         be introduced in function passes (`InstCombine` in particular)
>         and that seems highly desirable.
>
>         These all look like one of these cases:
>         1a) Introducing a new call to an LLVM intrinsic
>         1b) Replacing an existing call with a call to an LLVM intrinsic
>         2a) Introducing a new call to a declared library function (but
>         not defined)
>         2b) Replacing an existing call with a call to a declared
>         library function
>         3a) Introducing a new call to a defined library function
>         3b) Replacing an existing call with a call to a defined
>         library function
>
>         Both #1 and #2 are easy to handle in reality. Intrinsics and
>         declared functions don't impact the PM's call graph because
>         there is no need to order the walk over them. But #3 is a real
>         issue.
>
>         The only case I have found that actually hits #3 at all hits
>         #3b when building FORTIFY code with the new pass manager
>         because after inlining we do a lot of (really nice)
>         optimizations on library calls to remove unnecessary FORTIFY
>         checks. But this is in *theory* a problem when LTO-ing with
>         libc. More likely it could be a problem when LTO-ing with a
>         vector math library.
>
>
>     This latter case concerns me most. When the vectorizer creates the
>     vectorized version of a loop, that's new code (the original code
>     for the loop stays in place as a fall back). Further, the
>     vectorizer can (today) create calls to vector math library
>     functions, and a setup where we LTO with the definitions of those
>     functions is certainly possible (and desirable). As a result, this
>     issue does not seem all that theoretical to me. 
>
>
>     Moreover, once we have support for OpenMP simd functions, we'll
>     end up in exactly this situation on a regular basis, and we can't
>     have intrinsics for all of the possible user-defined functions
>
>
> Can you clarify how OpenMP simd would require introducing 
> out-of-thin-air references to an open-ended set of user-defined functions?

Because OpenMP simd functions (i.e. the "declare simd" functionality) 
allows the user to specify that vectorized versions of a given scalar 
function are to be made available, meaning generated, or are externally 
available. For example, let's say we have this:

#pragma omp declare simd notinbranch
float min (float a, float b) { Return a < b ? a : b; }

void minner (float *a, float *b, float *c) {
   #pragma omp parallel for simd
   for (i=0; i<N; i++)
   c[i] = min(a[i], b[i], c[i]);
}

And, assume for a moment that min() was not inlined before 
vectorization. The pragma says that we'll generate some vectorized 
version of the min function taking vector arguments (*), and then the 
vectorizer, when generating the vectorized loop body, will insert a call 
to said vectorized min function at the appropriate place. There are 
already patches floating around phabricator to implement this (although 
I'm not sure of their status), and it certainly is an important 
functionality.

(*) Intel, at least, has a well-defined ABI for these functions: 
https://software.intel.com/en-us/articles/vector-simd-function-abi

>
>     (unless the intrinsic just takes a function pointer and we clean
>     it up afterwards somehow).
>
>
> Note that simply referencing a function pointer out-of-thin-air would 
> still run afoul of the same issue. It would constitute a ref edge and 
> have similar implications as a direct call, at least as far as the 
> fundamental problem here is concerned (guaranteeing bottom-up 
> iteration order). So an intrinsic taking a function pointer wouldn't 
> really circumvent the issue (if I understand correctly what you're 
> saying).

Good point.

Thanks again,
Hal

>
> -- Sean Silva
>
>     In short, I think we do need to correctly handle this situation.
>
>     FWIW, I can also see this situation come up in other
>     instrumentation cases as well. There are plenty of cases where it
>     is useful to LTO with a runtime library.
>
>     Thanks again,
>     Hal
>
>
>         So what do we do?
>
>         My initial idea: find all *defined* library functions in the
>         module, and every time we create a ref edge to one of them,
>         synthesize a ref edge to all of them. This should completely
>         solve #3b above. But it doesn't really address #3a at all.
>
>         Is that OK? It would be very convenient to say that if we want
>         to introduce truly novel and new calls to library functions,
>         we should have an LLVM intrinsic to model those routines.
>
>         But we actually have an example (I think) of #3a, introducing
>         a call to a library function out of the blue: memset_pattern. =/
>
>         The only way I see to reasonably handle #3a is to have *every*
>         function implicitly contain a reference edge to every defined
>         library function in the module. This is, needless to say,
>         amazingly wasteful. Hence my email. How important is this?
>
>         If we need to correctly handle this, I think I would probably
>         implement this by actually changing the *iteration* of
>         reference edges in the graph to just implicitly walk the list
>         of defined library functions so that we didn't burn any space
>         on this. But it will make iteration of reference edges slower
>         and add a reasonable amount of complexity. So I'd like to hear
>         some other opinions before going down either of these roads.
>
>
>         Thanks,
>         -Chandler
>
>
>     -- 
>     Hal Finkel
>     Lead, Compiler Technology and Programming Languages
>     Leadership Computing Facility
>     Argonne National Laboratory
>
>
>     _______________________________________________
>     LLVM Developers mailing list
>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>     http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>     <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>
>

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170706/b80e12b1/attachment.html>