[llvm-dev] RFC: We need to explicitly state that some functions are reserved by LLVM

Thu Oct 26 20:14:08 PDT 2017

I've gotten a fantastic bug report. Consider the LLVM IR:

target triple = "x86_64-unknown-linux-gnu"

define internal i8* @access({ i8* }* %arg, i64) {
  ret i8* undef
}

define i8* @g({ i8* }* %arg) {
bb:
  %tmp = alloca { i8* }*, align 8
  store { i8* }* %arg, { i8* }** %tmp, align 8
  br i1 undef, label %bb4, label %bb1

bb1:
  %tmp2 = load { i8* }*, { i8* }** %tmp, align 8
  %tmp3 = call i8* @access({ i8* }* %tmp2, i64 undef)
  br label %bb4

bb4:
  ret i8* undef
}

This IR, if compiled with `opt -passes='cgscc(inline,argpromotion)'
-disable-output` hits a bunch of asserts in the LazyCallGraph.

The problem here is that `argpromotion` turns a normal looking function
`i8* @access({ i8* }* %arg, i64)` and turn it into a magical function `i8*
@access(i8* %arg, i64)`. This latter signature is the POSIX `access`
function that LLVM's `TargetLibraryInfo` knows magical things about.

Because *some* library functions known to `TargetLibraryInfo` can have
*calls* to them introduced at arbitrary points of optimization (consider
vectorized variants of math functions), the new pass manager and its graph
to provide ordering over the module get Very Unhappy when you *introduce* a
definition of a library function in the middle of the compilation pipeline.

And really, we do *not* want `argpromotion` to do this. We don't want it to
turn some random function by the name of `@free` into the actual `@free`
function and suddenly change how LLVM handles it.

So what do we do?

One option is to make `argpromotion` and every other pass that mutates a
function's signature rename the function (or add a `nobuiltin` attribute to
it). However, this seems brittle and somewhat complicated.

My proposal is that we admit that certain names of functions are reserved
in LLVM's IR. For these names, in some cases *any* function with that name
will be treated specially by the optimizer. We can still check the
signatures when transforming code based on LLVM's semantic understanding of
that function, but this avoids any need to check things when mutating the
signature of the function.

This would require frontends to avoid emitting functions by these names
unless they should have these special semantics. However, even if they do,
everything should remain conservatively correct. But I'll send an email to
cfe-dev suggesting that Clang start "mangling" internal functions that
collide with target names. I think this is important as I've found a quite
surprising number of cases where this happens in real code.

There is no need to auto-upgrade here, because again, LLVM's handling will
remain conservatively correct.

Does this seem reasonable? If so, I'll send patches to update the LangRef
with these restrictions. I'll also take a quick stab at generating some
example tables of such names from the .td files used by `TargetLibraryInfo`
already. These can't be authoritative because of the platform-specific
nature of it, but should help people understand how this area works.

One alternative that seems appealing but doesn't actually help would be to
make `TargetLibraryInfo` ignore internal functions. That is how the C++
spec seems to handle this for example (C library function names are
reserved only when they have linkage). But this doesn't work well for LLVM
because we want to be able to LTO an internalized C library. So I think we
need the rule for LLVM function names to not rely on linkage here.

Thanks,
-Chandler
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171027/cb865c00/attachment.html>