[LLVMdev] Compiling zlib to static bytecode archive

Maarten ter Huurne maarten at treewalker.org
Sun Sep 23 03:27:58 PDT 2007


On Friday 21 September 2007, Chris Lattner wrote:
> On Sep 21, 2007, at 9:42 AM, Maarten ter Huurne wrote:
> > However, it is not possible to let the zlib Makefile issue that
> > command
> > without patching the Makefile, because the fragment that does the
> > linking is
> > hardcoded to use the compiler command for linking:
> >
> >   example$(EXE): example.o $(LIBS)
> >           $(CC) $(CFLAGS) -o $@ example.o $(LDFLAGS)
>
> Right, unfortunately the current Link Time Optimization model
> requires the linker to "know" about LLVM.
> http://llvm.org/docs/LinkTimeOptimization.html

That's the reason I want to try and build a bytecode lib: to see if link time 
optimization of executable + libs has any effect on performance and on code 
size. My guess is that performance won't improve much, since there aren't 
that many calls per second which cross the app-lib boundary. But code size 
could improve if unused optional features can be elimated as dead code 
because a function is only called in one particular way.

By the way, the example from that document does not work with the current 
llvm-gcc (GCC 4.0, LLVM 2.1-pre1). The last command fails:

$ llvm-gcc a.o main.o -o main
a.o: file not recognized: File format not recognized
collect2: ld returned 1 exit status

Linking with llvm-ld does work:

$ llvm-ld a.o main.o -native -o main
$ ./main
$ echo $?
42

The link step combines one or more input files into one output file. The input 
files can be all bytecode, all native or mixed. The output file can be 
bytecode or native. Since it is only possible to convert from bytecode to 
native and not vice versa, bytecode output requires all bytecode input. So 
the combinations are:

bytecode input, bytecode output:
Can be handled by llvm-ld without invoking system compiler/linker.

native input, native output:
Handled by system compiler/linker.

bytecode or mixed input, native output:
According to the llvm-ld man page, llvm-ld will generate native code from the 
bytecode files and invoke the system compiler to do the actual linking.

> > Would it be possible to make llvm-gcc call llvm-ld instead of the
> > systemwide
> > ld? I tried setting the environment variables COMPILER_PATH=/usr/
> > local/bin
> > and GCC_EXEC_PREFIX=llvm- but that had no effect.
>
> I see two solutions to this.  One is to have llvm-gcc call llvm-ld
> when it has some option passed to it. Another would be to enhance 
> 'collect2' to know about LLVM files.  'collect2' is a GCC utility
> invoked at link time, it would be the perfect place to add hooks.

I found the documentation of collect2 here:
  http://gcc.gnu.org/onlinedocs/gccint/Collect2.html

Its purpose seems to be to act like ld and insert calls to initialization 
routines (and exit routines) before calling the real ld. The comment at the 
top of the source file describes it like this:

   Collect static initialization info into data structures that can be
   traversed by C++ initialization and finalization routines.

According to this comment in the collect2 source, having collect2 accept 
options that ld does not accept will cause trouble:

  /* !!! When GCC calls collect2,
     it does not know whether it is calling collect2 or ld.
     So collect2 cannot meaningfully understand any options
     except those ld understands.
     If you propose to make GCC pass some other option,
     just imagine what will happen if ld is really ld!!!  */

Originally I was under the impression that llvm-ld was just an LLVM-aware 
version of ld, but that is not the case. For example, when creating an output 
file in native format, it runs the system compiler on the generated native 
code and that compiler automatically picks up libraries such as libc, which 
must be specified explicitly to ld. Also, although llvm-ld accepts many of 
the options accepted by ld, GCC uses some ld options that llvm-ld does not 
accept.

Going back to the two options you mentioned, they would lead to the following 
invocation chains. Let's use the "mixed input, native output" scenario: if we 
can support that, we can support the rest as well.

llvm-gcc calling llvm-ld:
  llvm-gcc -> llvm-ld -> gcc -> collect2 -> ld

enhance collect2:
  llvm-gcc -> llvm-collect2 -> llvm-ld -> gcc -> collect2 -> ld

llvm-collect2 is the enhanced collect2, while plain collect2 is the one that 
belongs to the system compiler. Note that this assumes the system compiler is 
GCC, otherwise the "gcc -> collect2 -> ld" chain will be something else, but 
will perform the same function.

Since llvm-ld invokes the system compiler to do the actual linking, the 
executable it produces will already have the proper init/exit sequences. So 
llvm-collect2 would not have anything to do.

To summarize:
- llvm-ld (currently) does not accept all flags that GCC passes to collect2
- an LLVM-aware collect2 would never perform the core function of collect2,
  which is generating init/exit code and data

Therefore, I think the scenario of llvm-gcc calling llvm-ld directly is 
preferable.

> The thing we're missing most right now is a volunteer to tackle this
> project :)

Since this is all new terrain for me, I might get stuck before producing 
anything useful. But I'm willing to try.

Bye,
		Maarten
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20070923/4a28e0be/attachment.sig>


More information about the llvm-dev mailing list