[cfe-dev] architecture endianness and preprocessor defines

Wed May 7 10:44:58 PDT 2008

On May 7, 2008, at 8:58 AM, james woodyatt wrote:
> One of the things I've long disliked about how GCC works is that its
> developers have still not really sorted out how to handle
> architectures that can operate in either big- or little-endian mode.
> I'd like to know if the LLVM CFE developers have any thoughts on how
> to improve matters here.

You bring up a lot of interesting issues.  Some meta answers :)

>
> Here's what GCC does today, and how that situation produces
> consequences downstream:
>
> + The various architecture configurations define built-in preprocessor
> definitions like __BIG_ENDIAN__ and __LITTLE_ENDIAN__.

We aim to be GCC compatible with preprocessor directives.  This is  
important for compatibility with existing code.

> One of the additional hassles with GCC is that its "multilib" feature
> doesn't consistently build the C runtime environment, i.e. crtstuff.c,
> for both big- and little-endian modes.  This is why there are all
> those GCC target triples that look like "armeb-netbsd-elf" and  
> "mipsel-
> wrs-vxworks" and "armle-linux-gnu" in the configure script.  Notice
> that the suffixes aren't used consistently across operating system
> platforms?

I agree that this is irritating.  Two issues: 1) we will support the  
GCC target triples, at least when/if people contribute support for  
them.  2) clang is explicitly designed to support building a single  
tool chain in place that supports multiple targets.   The ultimate  
goal is that you should be able to configure clang with "-- 
targets='armeb-netbsd-elf mipsel-wrs-vxworks armle-linux-gnu'" and get  
support in the toolchain for all of them.  We already have support for  
handling this (-arch option and friends).  When we bring up the  
"libgcc" runtime library stuff, we'll make sure it can be built for  
multiple targets.

> The suffix on the architecture name ends up getting translated into
> the endianness of the C runtime environment modules used by the linker
> (except when -nostdlib is used... sigh).  If it weren't for this,
> you'd be able to build GCC for ARM or MIPS or whatever, without adding
> that suffix to the architecture part of the triple, and the -mbig-
> endian and -mlittle-endian switches would select the proper C runtime
> environment.  Sadly, that doesn't happen like it should.

Just because we will support the existing GCC target triples (again,  
when/if people contribute support for them) it doesn't mean we can't  
support simplified triples also.

> That still leaves the C preprocessor built-ins, which are clearly in
> Clang's domain to manage.  Here's what I propose: Clang should define
> a small set of general preprocessor built-ins that identify the CPU
> architecture family specified in the target triple, e.g. __ia32__,
> __x86_64__, __arm__, __powerpc__, __mips__, etc; it should also define
> __LITTLE_ENDIAN__ and __BIG_ENDIAN__ as appropriate, and it should
> offer the -mbig-endian and -mlittle-endian switches for explicitly
> specifying the endianness on architectures that can execute in either
> mode.  The command driver can then do the right thing (or the wrong
> thing) as necessary.

We have to support the existing ones.  Requiring people to 'port'  
their code to clang from GCC is not desirable.

That said, we *can* support nicer and cleaner interfaces as well for  
feature queries.  Over time, we can encourage people (who don't care  
about writing portable code (?)) to use these and/or try to get the  
GCC folks to adopt similar features.

-Chris