[llvm-commits] [patch][gold plugin] Don't internalize symbols in objects that we will use as pass-through

Tue Jun 22 16:22:38 PDT 2010

> The last remaining issue that i know is that gold can ask us to load a
> file because it is used only on the IR. We then internalize the
> symbols in that file and run codegen. Codegen can then create new
> undefined references to those files that forces gold to fetch them
> again (from the native code this time).

I'm not sure I understand the scenario here. As I understand it, you
have libgcc and libc in archive form, and full of IR objects. You have
a reference to a symbol in one of those libraries in your IR code, so
gold is loading a .o from the archive, and your plugin is claiming it.
During LTO, you determine you don't actually need the symbol (because
you ended up inlining it, perhaps, and it was IRONLY), so you drop the
definition only to have later codegen generate a call to that routine,
assuming that it's a standard library routine that can be found in
object form. Is this about right?

> There are a number of problem with this
> *) Gold doesn't implement this all that well. It still has in its
> symbol table that it loaded a file defining that symbol.

What's still in the symbol table is a placeholder symbol, though. You
mean that it won't load a definition for that symbol from an archive
of real objects? That could probably be fixed, although it seems like
the compiler really ought to be responsible for providing the
definition.

> *) We would end up with multiple copies of some symbols.

How? If you have references to A and B in the IR, and you define A in
a replacement file, but then search a real archive where A and B are
both defined in the same object, yes, you could get a multiple
definition. If this is the case, you should be breaking up the archive
library into finer-grained objects.

> *) LLVM can use smaller chunks of files than gold. Consider the case
> of a file defining functions foo and bar. Function foo is used from
> elf, and so we don't internalize it. Function bar is not used at that
> time and we drop it. Now codegen introduces an undefined reference to
> bar. What should gold do? Bringing in that file will fail because we
> will have two visible definitions of foo. Not doing so will fail
> because there is no where else to find bar.

Right. What should gold do? Ideally, LTO would be able to anticipate
any low-level routines needed by the late stages of codegen, and
provide definitions of those routines in the replacement files it adds
to the link. Adding a low-level library as a catch-all replacement
file at the end of the link is just a hack, but to make that work, you
need to have such symbols each in their own object file, so gold can
load just what it needs without being forced to load symbols it
doesn't need.

> The best solution I could find is to disable internalize for any
> functions defined in a library that is passed through. The attached
> patch does this. It can be optimized, but I am not sure if that is
> worth it, since normally there are only two libraries being passed
> through (libgcc and libc).

By "pass through," I assume you mean those low-level libraries that
provide the functions that might be called by code generated late
(after LTO analysis). Does this mean that gold first sees these
libraries as IR files, then the plugin turns around and adds the
"real" ELF equivalents as replacement files in order to catch these
references introduced by late codegen? If you decline to internalize
these symbols during LTO analysis, what's the point of providing them
as IR files in the first place? (Maybe I'm not clear on what
"internalize" means.)

-cary