[LLVMdev] Compiling zlib to static bytecode archive

Maarten ter Huurne maarten at treewalker.org
Thu Sep 27 19:08:29 PDT 2007


On Thursday 27 September 2007, Chris Lattner wrote:

> >> Sure, this would also work.  Is there any reason not to merge them
> >> together?
> >
> > Ease of maintenance, mainly. Having it in a separate file makes it
> > easier to migrate the code to new GCC releases. Also, collect2.c is
> > already 2658 lines, which is more than I typically like to have in a
> > single source file.
>
> My impression is that collect2 doesn't change very much.  In any case,
> the idea here would be that collect2 only has minimally invasive hooks to
> call into liblto.  It seems like this would be much simpler than handling
> all the command line argument swizzling needed for forking subprocesses,
> and having the LTO app have to read all the .o files and analyze them
> (which collect2 is already doing).

After studying collect2.c a bit more, I see that quite a lot of it is for 
option parsing and signal handling, so maybe merging is better indeed.

As far as I can see, collect2.c does not read the object files though: it 
only runs "nm" on them, which is not what we need to determine which files 
are bitcode files.

One thing I'm wondering is how to merge the C code of collect2 with the C++ 
code that uses liblto:
- convert collect2.c to collect2.cpp?
- put the C++ code in a separate source file and link the C object file and 
the C++ object file together into a single collect2 executable?
- expose more functionality from include/llvm-c/LinkTimeOptimizer.h? 
(meaning the code using liblto would be C, not be C++)


I currently have something that links the example without errors. It is not 
pretty though: a Python script intercepts the invocation of collect2, 
splits the list of object files into bitcode and native, calls a process I 
named "precollect" to link the bitcode objects into a single native object 
and then calls the real collect2 with only native objects. The precollect 
tool is based on the llvm-ld source.

What does not work yet, is the actual optimization: precollect does not take 
advantage of the fact that this is the final link step that will produce an 
executable and all unreferenced symbols are unused. Therefore the dead code 
elimination from the example is not performed. To make that possible, 
precollect would have to know about all object files, including the native 
ones, to determine which symbols are unused. Also, I should figure out how 
to tell liblto "there are no symbol references that you do not know about"; 
I assume that option already exists, but I didn't look for it yet.

Bye,
		Maarten
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20070928/f19643f0/attachment.sig>


More information about the llvm-dev mailing list