[LLVMdev] Compiling zlib to static bytecode archive
Maarten ter Huurne
maarten at treewalker.org
Sun Sep 23 03:27:58 PDT 2007
On Friday 21 September 2007, Chris Lattner wrote:
> On Sep 21, 2007, at 9:42 AM, Maarten ter Huurne wrote:
> > However, it is not possible to let the zlib Makefile issue that
> > command
> > without patching the Makefile, because the fragment that does the
> > linking is
> > hardcoded to use the compiler command for linking:
> >
> > example$(EXE): example.o $(LIBS)
> > $(CC) $(CFLAGS) -o $@ example.o $(LDFLAGS)
>
> Right, unfortunately the current Link Time Optimization model
> requires the linker to "know" about LLVM.
> http://llvm.org/docs/LinkTimeOptimization.html
That's the reason I want to try and build a bytecode lib: to see if link time
optimization of executable + libs has any effect on performance and on code
size. My guess is that performance won't improve much, since there aren't
that many calls per second which cross the app-lib boundary. But code size
could improve if unused optional features can be elimated as dead code
because a function is only called in one particular way.
By the way, the example from that document does not work with the current
llvm-gcc (GCC 4.0, LLVM 2.1-pre1). The last command fails:
$ llvm-gcc a.o main.o -o main
a.o: file not recognized: File format not recognized
collect2: ld returned 1 exit status
Linking with llvm-ld does work:
$ llvm-ld a.o main.o -native -o main
$ ./main
$ echo $?
42
The link step combines one or more input files into one output file. The input
files can be all bytecode, all native or mixed. The output file can be
bytecode or native. Since it is only possible to convert from bytecode to
native and not vice versa, bytecode output requires all bytecode input. So
the combinations are:
bytecode input, bytecode output:
Can be handled by llvm-ld without invoking system compiler/linker.
native input, native output:
Handled by system compiler/linker.
bytecode or mixed input, native output:
According to the llvm-ld man page, llvm-ld will generate native code from the
bytecode files and invoke the system compiler to do the actual linking.
> > Would it be possible to make llvm-gcc call llvm-ld instead of the
> > systemwide
> > ld? I tried setting the environment variables COMPILER_PATH=/usr/
> > local/bin
> > and GCC_EXEC_PREFIX=llvm- but that had no effect.
>
> I see two solutions to this. One is to have llvm-gcc call llvm-ld
> when it has some option passed to it. Another would be to enhance
> 'collect2' to know about LLVM files. 'collect2' is a GCC utility
> invoked at link time, it would be the perfect place to add hooks.
I found the documentation of collect2 here:
http://gcc.gnu.org/onlinedocs/gccint/Collect2.html
Its purpose seems to be to act like ld and insert calls to initialization
routines (and exit routines) before calling the real ld. The comment at the
top of the source file describes it like this:
Collect static initialization info into data structures that can be
traversed by C++ initialization and finalization routines.
According to this comment in the collect2 source, having collect2 accept
options that ld does not accept will cause trouble:
/* !!! When GCC calls collect2,
it does not know whether it is calling collect2 or ld.
So collect2 cannot meaningfully understand any options
except those ld understands.
If you propose to make GCC pass some other option,
just imagine what will happen if ld is really ld!!! */
Originally I was under the impression that llvm-ld was just an LLVM-aware
version of ld, but that is not the case. For example, when creating an output
file in native format, it runs the system compiler on the generated native
code and that compiler automatically picks up libraries such as libc, which
must be specified explicitly to ld. Also, although llvm-ld accepts many of
the options accepted by ld, GCC uses some ld options that llvm-ld does not
accept.
Going back to the two options you mentioned, they would lead to the following
invocation chains. Let's use the "mixed input, native output" scenario: if we
can support that, we can support the rest as well.
llvm-gcc calling llvm-ld:
llvm-gcc -> llvm-ld -> gcc -> collect2 -> ld
enhance collect2:
llvm-gcc -> llvm-collect2 -> llvm-ld -> gcc -> collect2 -> ld
llvm-collect2 is the enhanced collect2, while plain collect2 is the one that
belongs to the system compiler. Note that this assumes the system compiler is
GCC, otherwise the "gcc -> collect2 -> ld" chain will be something else, but
will perform the same function.
Since llvm-ld invokes the system compiler to do the actual linking, the
executable it produces will already have the proper init/exit sequences. So
llvm-collect2 would not have anything to do.
To summarize:
- llvm-ld (currently) does not accept all flags that GCC passes to collect2
- an LLVM-aware collect2 would never perform the core function of collect2,
which is generating init/exit code and data
Therefore, I think the scenario of llvm-gcc calling llvm-ld directly is
preferable.
> The thing we're missing most right now is a volunteer to tackle this
> project :)
Since this is all new terrain for me, I might get stuck before producing
anything useful. But I'm willing to try.
Bye,
Maarten
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20070923/4a28e0be/attachment.sig>
More information about the llvm-dev
mailing list