[llvm-dev] [RFC] Rearchitect Gnu toolchain driver to simplify multilib support

Frank Schaefer via llvm-dev llvm-dev at lists.llvm.org
Tue Oct 2 20:02:28 PDT 2018


Hi all,

I've been poking around with llvm+clang+compiler-rt, trying to get it
working on Linux ARM soft-float (yes, ARM soft-float support is pretty
broken).  Along the way I tried writing a multilib toolchain driver
for ARM soft/hard float, with only partial success.  For reference see
https://reviews.llvm.org/D52705#inline-464117.

One thing I noticed while doing this (and a few other people seem to
agree on) is that the entire Gnu toolchain driver set could be greatly
simplified.  So far, it seems like every time someone has encountered
a new multilib case (either a new arch or a new distro arrangement),
the response has been to pile on another custom multilib driver, or
add a bunch of corner-case codepaths to an existing driver.  That's
been done so many times that the existing driver set is honestly
starting to collapse under its own weight. :-(

I'm now contemplating what it would take to reduce the entire driver
set to something that simply figures out all the multilib/multiarch
distinctions by querying the existing gcc installation.  This could
theoretically cover all Gnu multilib cases in a single codepath.

Some background:

Current GNU toolchains (gcc+glibc+binutils) tend to encapsulate all
multilib knowledge in gcc, including:

* What flags trigger a specific multilib selection
* What directories are associated with a particular multilib selection
(what we know as osSuffix()/gccSuffix())
* What run-time linker (/llib/ld-<arch>.so.<ver>) to use for a
particular multilib selection

This is highly customizable at gcc build time via a bunch of
arch+OS+ABI configuration fragments in the "gcc/config" directory of
the gcc source tree, and a lot of Linux distros have taken their own
liberties with this configuration.  That's part of why clang's Gnu
toolchain driver is in the state it's in.

The rough outline of what I would propose:
1. clang's CMakeLists can scan the spec tokens for a selected gcc
installation (available via "gcc -dumpspecs") and pick out the
important tokens (so far I know this includes "*multilib",
"*multilib_matches", "*multilib_defaults", "*multilib_options", and
"*link").
2. clang's Gnu driver can be re-coded to parse the relevant spec tokens.
3. clang's Gnu driver can build up a complete unified MultilibSet
based on these tokens.

Some potential complications I anticipate:
1. I don't know how consistently gcc has used these spec tokens, or
how the formatting has evolved over time.  Mimicking the current (gcc
8.2.0) format seems sensible, but what we pull from older gcc
installations may not comport with what we expect.
2. I don't see anything in the spec tokens that describes system
header arrangement.  Vanilla multilib-enabled gcc seems to honor
/usr/include/<os-suffix> (where <os-suffix> seems to conform to the
output of "gcc <flags> -print-multiarch").  Note that this doesn't
necessarily match the osSuffix; I've produced functional GNU
toolchains that honor a standard-triple osSuffix, but don't honor
_anything_ like it under /usr/include.
3. g++, OTOH, expects all C++ headers to be under
/usr/include/c++/<version>.  Vanilla g++ keeps some headers further
subbed under <os-suffix>, with some of those further subbed again
under <gcc-suffix> for non-default multilib cases.  Just to complicate
things, Debian/Ubuntu g++ has apparently been adapted to employ the
/usr/include/<os-suffix> for multilib-specific C++ headers.  If other
distros do their own thing with this, then I see no straightforward
way to autodetect anything but a few obvious cases.

To address the above complications, I would suggest adding CMake
options for users to supply their own multilib descriptor tokens, in
case whatever's in gcc specs doesn't work for them.  We might even
allow for an extra token or two to better describe C/C++ header
layout.

This would all require a LOT of planning and testing, especially
across the multiple targets/distros the Gnu toolchain driver currently
supports.  I'm not sure how to access suitable testbeds for a lot of
it (I count myself lucky just to have a reasonably-powerful ARM
build-box).  At least initially, I think we would have to keep the old
hodgepodge driver code around alongside the new unified driver code.

--
Frank
"If a server dies in a server farm and no one pings it, does it still
cost four figures to fix?"


More information about the llvm-dev mailing list