[patch] First step to fix pr11866 during LTO

Nick Kledzik kledzik at apple.com
Wed Oct 2 18:59:11 PDT 2013

On Oct 2, 2013, at 6:24 PM, Rafael Espíndola <rafael.espindola at gmail.com> wrote:

>> Yes, I think it would work. The semantics would be
>> * If lto_codegen_preserve_global_symbols was not called, there is no
>> symbol table to worry about (executable and no -export-dynamic). LLVM
>> should keep only the symbols passed to
>> lto_codegen_add_must_preserve_symbol and internalize everything else.
>> * If lto_codegen_preserve_global_symbols is called, LLVM knows that
>> the globals are going to a symbol table. It must preserve everything
>> passed to lto_codegen_add_must_preserve_symbol as before. It can
>> internalize globals only if knows they can be dropped from the symbol
>> table.
> Hi Nick,
> Sorry for the long delay. I went on vacations and it took me some time
> to catch up with code review.
> An awesome development while I was out is that Tom Roeder added a test
> driver for LTO, so now the patch has a testcase which should make it
> more clear what is being implemented.
> Thinking about your idea of adding lto_codegen_preserve_global_symbols
> I decided that it would be better to get some numbers. I took the
> largest "shared library" I had at hand: clang with -export_dynamic. On
> linux gold currently asks for 22922 symbols to be preserved. I applied
> the attached hack-timeit.patch to test how long it takes for us to get
> the internalized combined module. On my machine (2010 iMac),  the time
> is 0.08s, so despite large number of calls and the
> std::map<std::string>, internalize in general is not too slow. For
> scale, the verifier takes about 1s. Parsing the bitcode file takes 5s
> (the file is 118 MB).
> With this patch, the only symbol that gold doesn't mark as "symtab
> only" is main. This lets llvm drop 445 symbols. This is not a lot, but
> the current patch doesn't even check if an address is taken (it just
> uses the unnabed_addr attribute).


Ok, the performance of all those calls is in the noise.  

When a new API is added to llvm-c/lto.h, please bump LTO_API_VERSION
so that the linker can configure itself to use the new function. 

But, is there a better name than “lto_codegen_add_symtab_symbol()”.  That
name has no obvious meaning.  All symbols are symbol table symbols… 
I had to go back and re-read the email thread to figure out what it was 
supposed to do again.  

How about something like (existing plus new):

 * Tells LTO optimization passes that this symbol must be preserved 
 * because it is referenced by native code or a command line option.
void lto_codegen_add_must_preserve_symbol(lto_code_gen_t cg, const char* symbol);

 * Tells LTO optimization passes that a dynamic shared library is being 
 * built which exports all global symbols by default.  And that unless C++
 * semantics allow the symbol to be completely optimized away, it should
 * remain so it can be exported by the shared library.
void lto_codegen_add_dso_symbol(lto_code_gen_t cg, const char *symbol);


More information about the llvm-commits mailing list