[LLVMdev] Compiling zlib to static bytecode archive

Chris Lattner clattner at apple.com
Tue Sep 25 17:17:55 PDT 2007


On Sep 23, 2007, at 3:27 AM, Maarten ter Huurne wrote:

> On Friday 21 September 2007, Chris Lattner wrote:
>> On Sep 21, 2007, at 9:42 AM, Maarten ter Huurne wrote:
>>> However, it is not possible to let the zlib Makefile issue that
>>> command
>>> without patching the Makefile, because the fragment that does the
>>> linking is
>>> hardcoded to use the compiler command for linking:
>>>
>>>   example$(EXE): example.o $(LIBS)
>>>           $(CC) $(CFLAGS) -o $@ example.o $(LDFLAGS)
>>
>> Right, unfortunately the current Link Time Optimization model
>> requires the linker to "know" about LLVM.
>> http://llvm.org/docs/LinkTimeOptimization.html
>
> That's the reason I want to try and build a bytecode lib: to see if  
> link time
> optimization of executable + libs has any effect on performance and  
> on code
> size.

Right.

> My guess is that performance won't improve much, since there aren't
> that many calls per second which cross the app-lib boundary. But  
> code size
> could improve if unused optional features can be elimated as dead code
> because a function is only called in one particular way.

make sense!

> By the way, the example from that document does not work with the  
> current
> llvm-gcc (GCC 4.0, LLVM 2.1-pre1). The last command fails:
>
> $ llvm-gcc a.o main.o -o main
> a.o: file not recognized: File format not recognized
> collect2: ld returned 1 exit status

Again, this is because your native linker doesn't support liblto.

> Linking with llvm-ld does work:
>
> $ llvm-ld a.o main.o -native -o main
> $ ./main
> $ echo $?
> 42
>
> The link step combines one or more input files into one output  
> file. The input
> files can be all bytecode, all native or mixed. The output file can be
> bytecode or native. Since it is only possible to convert from  
> bytecode to
> native and not vice versa, bytecode output requires all bytecode  
> input. So
> the combinations are:
>
> bytecode input, bytecode output:
> Can be handled by llvm-ld without invoking system compiler/linker.

Yes, but note that this only works if you limit yourself to linker  
options known by llvm-ld.  If you use funky stuff, llvm-ld won't be  
able to handle it.  Also, llvm-ld may or may not handle archive  
resolution correctly (I don't remember).

> native input, native output:
> Handled by system compiler/linker.
>
> bytecode or mixed input, native output:
> According to the llvm-ld man page, llvm-ld will generate native  
> code from the
> bytecode files and invoke the system compiler to do the actual  
> linking.

Yes.

>>> Would it be possible to make llvm-gcc call llvm-ld instead of the
>>> systemwide
>>> ld? I tried setting the environment variables COMPILER_PATH=/usr/
>>> local/bin
>>> and GCC_EXEC_PREFIX=llvm- but that had no effect.
>>
>> I see two solutions to this.  One is to have llvm-gcc call llvm-ld
>> when it has some option passed to it. Another would be to enhance
>> 'collect2' to know about LLVM files.  'collect2' is a GCC utility
>> invoked at link time, it would be the perfect place to add hooks.
>
> I found the documentation of collect2 here:
>   http://gcc.gnu.org/onlinedocs/gccint/Collect2.html
>
> Its purpose seems to be to act like ld and insert calls to  
> initialization
> routines (and exit routines) before calling the real ld. The  
> comment at the
> top of the source file describes it like this:
>
>    Collect static initialization info into data structures that can be
>    traversed by C++ initialization and finalization routines.

Right, that is its intended purpose.  It seems fairly straight  
forward to abuse it for our devious plans though :)

> Originally I was under the impression that llvm-ld was just an LLVM- 
> aware
> version of ld, but that is not the case. For example, when creating  
> an output
> file in native format, it runs the system compiler on the generated  
> native
> code and that compiler automatically picks up libraries such as  
> libc, which
> must be specified explicitly to ld. Also, although llvm-ld accepts  
> many of
> the options accepted by ld, GCC uses some ld options that llvm-ld  
> does not
> accept.

Right.

> Going back to the two options you mentioned, they would lead to the  
> following
> invocation chains. Let's use the "mixed input, native output"  
> scenario: if we
> can support that, we can support the rest as well.
>
> llvm-gcc calling llvm-ld:
>   llvm-gcc -> llvm-ld -> gcc -> collect2 -> ld
>
> enhance collect2:
>   llvm-gcc -> llvm-collect2 -> llvm-ld -> gcc -> collect2 -> ld

I'd rather enhance collect2 like this:

llvm-gcc -> llvm-collect2(liblto) -> ld

Where llvm-collect2 is just collect2 that dlopen's liblto to do the  
optimization work. This makes it work much more naturally than adding  
a whole new set of steps.  Depending on llvm-ld will never get you to  
a world where LTO is transparent, because llvm-ld doesn't support a  
lot of options and features that native linkers do.

> To summarize:
> - llvm-ld (currently) does not accept all flags that GCC passes to  
> collect2
> - an LLVM-aware collect2 would never perform the core function of  
> collect2,
>   which is generating init/exit code and data
>
> Therefore, I think the scenario of llvm-gcc calling llvm-ld  
> directly is
> preferable.

Ah, but if the llvm-collect2 version was enhanced to do everything it  
does now, and additionally interface with liblto, then everyone wins :)

>> The thing we're missing most right now is a volunteer to tackle this
>> project :)
>
> Since this is all new terrain for me, I might get stuck before  
> producing
> anything useful. But I'm willing to try.

Yay!  Many people will appreciate this!

-Chris



More information about the llvm-dev mailing list