[LLVMdev] Distribution in assembler format

Tom Prince tom.prince at ualberta.net
Sat Jan 30 12:40:05 PST 2010


On Fri, Jan 29, 2010 at 12:26:45PM -0800, Dan Gohman wrote:
> 
> On Jan 29, 2010, at 12:09 PM, Samuel Crow wrote:
> 
> > Hello Russell,
> > 
> > Major pitfall #1:
> > LLVM-GCC does certain optimizations even if all of the optimizations are turned off.  These include endian-specific optimizations so to use LLVM as a cross-architecture bitcode, you'll need to wait until Clang supports C++ fully or just stick to C programs for now.
> > 
> > I've been looking forward to the day that LLVM can be used for cross-architecture development, myself.
> 
> FYI,
> 
> http://llvm.org/docs/FAQ.html#platformindependent
> 
> applies to clang just as much as llvm-gcc.
> 
> Dan

I have seen the claim on this list numerous times, by people probably much
more knowledgeable than me, that C/C++ can't be compiled to platform
independent code.

I thinks this misses some subtleties.


One issue is calling conventions. There are two related  issues I have seen
come up on the list. The first is that LLVM doesn't implement all the ABI (in
particular aggregate returns). The reason given for this is the second issue,
that LLVM doesn't have enough information to correctly implement the ABI. The
only example that I can recall seeing is that C complex values have a
different conventions from structures.

Now, most code doesn't use the complex data type, and so the second issue
doesn't affect it. I'd argue for code that doesn't LLVM IR can be made so that
it isn't platform dependent (with regard to calling conventions); although
LLVM won't "correctly" compile it currently.


Another major issue is data type size. I'll ignore for the moment interfacing
with external code. The C data types char, short, int, long are all platform
independent. There are also platform independent data types (u)int?_t, which
should clearly compile to platform-independent code. All that the C standard
says about the basic types is 
char <= short <= int <= long
7 <= char, 16 <= short, 16 or 32 <= int
Now, if one were to fix the sizes of the types, the resulting LLVM code would
be platform independent (although not able to call external interfaces
properly). Alternatively, if LLVM IR were to be extended with platform
dependent integer types, the generated IR would be platform independent, and
still able to interface external libraries. Given the textual representation,
C code could be compiled to platform-independent IR using undefined named
types, which could be augmented with the platform-dependent definitions of the
named type later. The same issue crops up for size_t and ptrdiff_t, which
coulde be solved by the ptrint/intptr proposal.


The final issue is interfacing with external code. For some people, as they
can present a platform independent interface to the C code which they want to
compile to LLVM.

If one does care, then this is a much trickier issue, for which there are no
generic solutions. But I think that in a lot of cases, this isn't an
insurmountable issue.

For example, libjpeg uses the preprocessor to pick the types to use, and so it
generates non-portable code. There are at least two ways around this. One
could configure libjpeg to use platform independent types; then code compiled
against it would generate LLVM portable to any machine with libjpeg similarly
configured. On the other hand, libjpeg by default uses (the same) fundamental C types on nay sane modern system, so if LLVM IR could represent those types, then IR would again be portable.

I am not intimately familiar with the Linux Standard Base, but I get the
impression that the binary part of the interface is almost completely
specified in terms of the fundamental types and size_t and ptrdiff_t. That is,
that if those types has a portable representation, then most any program
compiled against the LSB would portable.


My summary would be that most C/C++ code can't be compiled to portable IR, but
that if one were careful, that it is possible to write C/C++ code that could
be compiled to portable IR. And if LLVM were extended with a few controversial
extensions, one could right carefully write C/C++ that could compile to
portable IR, that would correctly interface with most libraries.

  Tom Prince



More information about the llvm-dev mailing list