[cfe-dev] architecture endianness and preprocessor defines
james woodyatt
jhwoodyatt at mac.com
Wed May 7 08:58:31 PDT 2008
everyone--
One of the things I've long disliked about how GCC works is that its
developers have still not really sorted out how to handle
architectures that can operate in either big- or little-endian mode.
I'd like to know if the LLVM CFE developers have any thoughts on how
to improve matters here.
Here's what GCC does today, and how that situation produces
consequences downstream:
+ The various architecture configurations define built-in preprocessor
definitions like __BIG_ENDIAN__ and __LITTLE_ENDIAN__.
+ These are hard-coded for architectures that don't have any choice,
e.g. IA32, but they're switched by the -mbig-endian and -mlittle-
endian on architectures that can be configured to run in either mode.
+ These built-in definitions aren't consistently defined across all
the architectures either, so on some architectures you get
__BIG_ENDIAN and on others you get __BIG_ENDIAN__. Isn't that
wonderful?
One of the additional hassles with GCC is that its "multilib" feature
doesn't consistently build the C runtime environment, i.e. crtstuff.c,
for both big- and little-endian modes. This is why there are all
those GCC target triples that look like "armeb-netbsd-elf" and "mipsel-
wrs-vxworks" and "armle-linux-gnu" in the configure script. Notice
that the suffixes aren't used consistently across operating system
platforms?
The suffix on the architecture name ends up getting translated into
the endianness of the C runtime environment modules used by the linker
(except when -nostdlib is used... sigh). If it weren't for this,
you'd be able to build GCC for ARM or MIPS or whatever, without adding
that suffix to the architecture part of the triple, and the -mbig-
endian and -mlittle-endian switches would select the proper C runtime
environment. Sadly, that doesn't happen like it should.
I'm not sure how much Clang should need to know about the C runtime
environment that will eventually get linked up with final executable
machine objects, but it would be nice if you didn't have to apply this
horrible corruption to the architecture part of the target triple.
I'd rather the command driver were responsible for sorting out which
runtime environments to link into what executables, and it should be
able to do the right thing with just the command line switches.
That still leaves the C preprocessor built-ins, which are clearly in
Clang's domain to manage. Here's what I propose: Clang should define
a small set of general preprocessor built-ins that identify the CPU
architecture family specified in the target triple, e.g. __ia32__,
__x86_64__, __arm__, __powerpc__, __mips__, etc; it should also define
__LITTLE_ENDIAN__ and __BIG_ENDIAN__ as appropriate, and it should
offer the -mbig-endian and -mlittle-endian switches for explicitly
specifying the endianness on architectures that can execute in either
mode. The command driver can then do the right thing (or the wrong
thing) as necessary.
I'd like to know if the Clang developers are interested in resisting
the endianness suffixes on the architecture parts of the target triple
specification. I hope the answer is yes.
—
j h woodyatt <jhw at conjury.org>
http://jhw.vox.com/
--
j h woodyatt <jhw at conjury.org>
More information about the cfe-dev
mailing list