[cfe-dev] architecture endianness and preprocessor defines

Wed May 7 08:58:31 PDT 2008

everyone--

One of the things I've long disliked about how GCC works is that its  
developers have still not really sorted out how to handle  
architectures that can operate in either big- or little-endian mode.   
I'd like to know if the LLVM CFE developers have any thoughts on how  
to improve matters here.

Here's what GCC does today, and how that situation produces  
consequences downstream:

+ The various architecture configurations define built-in preprocessor  
definitions like __BIG_ENDIAN__ and __LITTLE_ENDIAN__.

+ These are hard-coded for architectures that don't have any choice,  
e.g. IA32, but they're switched by the -mbig-endian and -mlittle- 
endian on architectures that can be configured to run in either mode.

+ These built-in definitions aren't consistently defined across all  
the architectures either, so on some architectures you get  
__BIG_ENDIAN and on others you get __BIG_ENDIAN__.  Isn't that  
wonderful?

One of the additional hassles with GCC is that its "multilib" feature  
doesn't consistently build the C runtime environment, i.e. crtstuff.c,  
for both big- and little-endian modes.  This is why there are all  
those GCC target triples that look like "armeb-netbsd-elf" and "mipsel- 
wrs-vxworks" and "armle-linux-gnu" in the configure script.  Notice  
that the suffixes aren't used consistently across operating system  
platforms?

The suffix on the architecture name ends up getting translated into  
the endianness of the C runtime environment modules used by the linker  
(except when -nostdlib is used... sigh).  If it weren't for this,  
you'd be able to build GCC for ARM or MIPS or whatever, without adding  
that suffix to the architecture part of the triple, and the -mbig- 
endian and -mlittle-endian switches would select the proper C runtime  
environment.  Sadly, that doesn't happen like it should.

I'm not sure how much Clang should need to know about the C runtime  
environment that will eventually get linked up with final executable  
machine objects, but it would be nice if you didn't have to apply this  
horrible corruption to the architecture part of the target triple.   
I'd rather the command driver were responsible for sorting out which  
runtime environments to link into what executables, and it should be  
able to do the right thing with just the command line switches.

That still leaves the C preprocessor built-ins, which are clearly in  
Clang's domain to manage.  Here's what I propose: Clang should define  
a small set of general preprocessor built-ins that identify the CPU  
architecture family specified in the target triple, e.g. __ia32__,  
__x86_64__, __arm__, __powerpc__, __mips__, etc; it should also define  
__LITTLE_ENDIAN__ and __BIG_ENDIAN__ as appropriate, and it should  
offer the -mbig-endian and -mlittle-endian switches for explicitly  
specifying the endianness on architectures that can execute in either  
mode.  The command driver can then do the right thing (or the wrong  
thing) as necessary.

I'd like to know if the Clang developers are interested in resisting  
the endianness suffixes on the architecture parts of the target triple  
specification.  I hope the answer is yes.

—
j h woodyatt <jhw at conjury.org>
http://jhw.vox.com/

-- 
j h woodyatt <jhw at conjury.org>