[PATCH] Module: Use existing library functions

Mon Mar 3 15:01:28 PST 2014

On Mar 3, 2014, at 1:15 PM, Justin Bogner <mail at justinbogner.com> wrote:

> "Duncan P. N. Exon Smith" <dexonsmith at apple.com> writes:
>> During LTO, user-supplied definitions of C library functions often
>> exist.  -instcombine uses Module::getOrInsertFunction() to get a handle
>> on library functions (e.g., @puts, when optimizing @printf).
>> 
>> Previously, Module::getOrInsertFunction() would rename any matching
>> functions with local linkage, and create a new declaration.  In LTO,
>> this is the opposite of desired behaviour, as it skips by the
>> user-supplied version of the library function and creates a new
>> undefined reference which the linker often cannot resolve.
> 
> Why do the user-supplied versions have local linkage? That seems a bit
> odd to me, but I'm not all that familiar with the details of LTO.

Functions with external linkage (that the linker doesn’t need) are given
local linkage (specifically, “internal”) during -internalize.  This is
the first step of LTO, and is vital to enabling other optimizations.

User-supplied C library functions that fall into this category need
special treatment, since -instcombine (at least) inserts new calls to
them.  Effectively, the library functions may have unknown uses that
aren’t yet in the module.  To model these unknown uses, such functions
are added to @llvm.compiler.used.

If a global is in @llvm.compiler.used, no optimization is allowed to
change the meaning of it.  Either no optimization should call
getOrInsertFunction() without checking @llvm.compiler.used, or
getOrInsertFunction() should avoid changing the names of such functions.

Why internalize these functions at all, if they can’t be optimized away?
Once the optimization pipeline is complete, @llvm.compiler.used is
discarded.  In particular, it’s not passed on to the linker.  Giving
these functions internal linkage allows them to be dead-stripped.

>> Removing that logic did not cause any tests to fail, and I couldn't
>> think of why it's *good* behaviour.  Maybe no one was using it?
> 
> As far as I can tell there are three use cases for getOrInsertFunction:
> 
> 1. Lazily creating a function that we want to manipulate - the first
>   call of getOrInsertFunction inserts, and subsequent calls get. The
>   "insert" logic creates a function with ExternalLinkage, so the
>   removed code path shouldn't occur for this usage. On the other hand,
>   if someone is attempting to do this and a function with local linkage
>   already exists with that name, the result will be completely broken,
>   since we'll get a large number of copies of the function we intended
>   to manipulate.

Actually, I think this use case is well-supported by the current
behaviour.  The first call will move the function with local linkage and
insert a function with external linkage.  Subsequent calls will get the
function with external linkage.

The new behaviour, on the other hand, could cause a problem with this
use case.

> 2. Accessing a function with a well known name, which expect will
>   already exist. Creating a function here doesn't make much sense, so
>   the renaming behaviour also seems broken, but it's surprising to me
>   that such a function would have local linkage, as above, so I don't
>   see why we hit this.

We usually only hit this in LTO.

Another time we could hit this is if someone has defined a version
of, e.g., puts() in a local file and declared it static.  On inserting
a call to puts(), the old behaviour would move the static method out of
the way, while the new behaviour would use the static method.

This could be a problem.  Comments?

> 3. Creating a unique function. In this came I guess the name is only
>   there to help with debugging, so the behaviour we currently have kind
>   of makes sense. This is a bit silly though, since the "get" part of
>   getOrInsertFunction would never be wanted. I'd expect callers that
>   need this (if there are any) to just do Function::Create
> 
> So basically, I can't think of how this would be good behaviour either,
> and there are at least two ways in which it's bad.

Anyone else have an opinion?