[LLVMdev] The Trouble with Triples

Wed Jul 29 23:51:42 PDT 2015

Hi Daniel,

> (from the context, you might have meant 'tuple' where you've written
> 'triple'. I'm answering based on the assumption you meant 'triple')
>
>
I did mean what I wrote.

> The GNU triple is already used as a way of encoding a large amount of the
> target data in a string but unfortunately, while this data is passed
> throughout LLVM, it isn't reliable because GNU triples are ambiguous and
> inconsistent. For example, in GCC toolchains mips-linux-gnu probably means
> a MIPS target on Gnu/Linux but anything beyond that (ISA revision, default
> ABI, multilib layout, etc.) is up to the person who built the toolchain and
> may change over time. Another example is that Debian's definition for
> i386-linux-gnu has been i486 and i586 at various points in time.
>
>
Sorta...

> The proposed TargetTuple is a direct replacement for the GNU triple and is
> intended to resolve this ambiguity and move away from a string-based
> implementation (we need to keep a string serialization though, see below).
> Essentially, I'm trying to push the ambiguity out of the internals and give
> the distributor control of how the ambiguity is resolved for their
> environment. Once that is done, we'll be able to rely on the TargetTuple
> for information about the target such as ABI's, architecture revisions,
> endianness, etc.
>
>
This is pretty vague.

> I agree that we should open up the API to specify the appropriate data and
> that is something that TargetTuple will acquire during step 4 and 7 of the
> plan (mostly step 7 where compiler/tool options begin mutating the target
> tuple). I don't agree with keeping the GNU triple around though for two
> main reasons. The first is that most people believe that GNU triples
> accurately describe the target and there will be a strong temptation to
> inappropriately base logic on them. The second is that the meaning of the
> triple varies between toolchain builds and over time and there is a
> significant potential for bugs where different parts of the toolchain use
> different meanings for the same GNU triple (due to rebuilding or switching
> toolchains, or moving objects from system to system). We ought to resolve
> the ambiguity once and then stick to that interpretation.
>
> The string serialization I mentioned above is useful for LLVM-IR as part
> of a direct replacement for the 'target triple' statement. We could split
> this statement up into smaller pieces but the migration to target tuples is
> already difficult so I think it would be best to do a direct replacement
> first and redesign the IR statements later if we want to. The serialization
> is also useful for command line options on internal tools such as llc to
> give us precise control over our tests that the GNU triple can't deliver.
> This will be particularly important when distributors can apply their own
> disambiguations to GNU triples. The serialization may also be useful as
> part of a C API but I haven't given the C API much thought beyond
> preserving the current API.
>
>
My first impression of using this serialization as is that it's something
I'm against. Keep in mind that being able to parse the string can't invoke
a target backend to handle the rest of the parsing. It'd need to be as
generic as a DataLayout if you want to do this sort of thing and I'm
entirely uncertain this is possible for the goals you (and I) have in mind
here.

> Hopefully, that helps clear up your concerns. Let me know if there's
> anything that still seems strange.
>
>
Not really. I don't see much of a sketch on what you have in mind for your
"TargetTuple" here other than "it'll be a bunch of things together".

Let me be clear, I do agree with you that the Triple by itself is
insufficient for what we want long term in the backends, however, we won't
be able to get rid of it completely. It's too ingrained into how cross
compilation is done as a base. It is, however, possible to design an API
that includes the Triple and the relevant information to augment
sufficiently. My vision for this is an API that has a base part that is
going to be generic across all targets (think the current arguments to the
TargetMachine constructor), and additional target specific information that
can be passed in via user customization (i.e. command line options etc).

> > My suggestion on a route forward here is that we should look at the
> particular
> > API and areas of the backend that you're having an issue with and figure
> out
> > how to best communicate the data you'd like to the appropriate area. I
> realize
> > this probably seems a little vague and handwavy, but I don't know what
> areas
> > you've been having problems with lately. I'll absolutely help with this
> effort if
> > you need assistance or guidance in any way.
>
> The MIPS specific problems are broad and varied. Some of the bigger ones
> are:
> * Building clang on a 32-bit Debian and a 64-bit MIPS processor produces a
> compiler that cannot target the native system. The release packages work
> around this by 'cross-compiling' from the host triple to the target triple
> which are different strings (mips-linux-gnu vs mips64-linux-gnu) but have
> the same meaning.
> * It's not possible to produce a clang that can generate code for both
> 32-bit and 64-bit MIPS without one of them needing a -target option to
> change the GNU triple. This is because we based the logic on the triple and
> lack anything else to use.
>

I blame the mips backend for this one. We can do -m32/-m64 just fine for
x86 as an example. Some backends have this problem, others don't.

> * Various details (ELF headers, label prefixes, exception personality, JIT
> target, etc.) depend on the ABI and OS Distribution rather than just 32-bit
> vs 64-bit
>

Sure?

> * It's not possible to implement clang in a way that can support all of
> mips-linux-gnu's possible meanings. mips-mti-linux-gnu, and
> mips-img-linux-gnu have the same problem to a lesser degree
>

I'm really not sure what any of these things are bringing up. You haven't
actually said what communication problem you're trying to solve between the
user and the compiler here. How about we start this from another
perspective? Can you give some examples of what you'd like to do to
communicate the information you think you need to various parts of the
backend and how you'd like to communicate it?

I promise I'm not trying to be (on purpose at least) particularly dense
here, but I just don't have enough information to work with here. I agree
that we probably have an API problem - some of which I solved for the mips
backend at one point using MCOptions (which I don't really like as a
general solution), but a more general solution that'll work and be cleaner
is definitely a direction I'd like us to go.

-eric

> ________________________________________
> From: Eric Christopher [echristo at gmail.com]
> Sent: 29 July 2015 21:44
> To: Daniel Sanders; LLVM Developers Mailing List (llvmdev at cs.uiuc.edu)
> Cc: Renato Golin (renato.golin at linaro.org); Jim Grosbach (
> grosbach at apple.com)
> Subject: Re: The Trouble with Triples
>
> Hi Daniel,
>
> I'm not sure I agree with the basic idea of using the target triple as a
> way of encoding all of the pieces of target data as a string. I think in a
> number of cases what we need to do is either open up API to the back end to
> specify things, or encode the information into the IR when it's different
> from the generic triple. Ideally the triple will have enough information to
> do basic layout and anything else can be either gotten from the IR or
> passed via option.
>
> My suggestion on a route forward here is that we should look at the
> particular API and areas of the backend that you're having an issue with
> and figure out how to best communicate the data you'd like to the
> appropriate area. I realize this probably seems a little vague and
> handwavy, but I don't know what areas you've been having problems with
> lately. I'll absolutely help with this effort if you need assistance or
> guidance in any way.
>
> Thanks!
>
> -eric
>
> On Wed, Jul 8, 2015 at 7:31 AM Daniel Sanders <Daniel.Sanders at imgtec.com
> <mailto:Daniel.Sanders at imgtec.com>> wrote:
> Hi,
>
> In http://reviews.llvm.org/D10969, Eric asked me to explain the wider
> context of the TargetTuple object that was replacing Triple on llvmdev so
> here it is.
>
> Before I start, I'm sure I don't know the full extent of GNU triple
> ambiguity and lack of canonicity. Additional examples are welcome.
>
> The Problem
>
> As you know, LLVM uses a GNU Triple is as a target description that can be
> relied upon to make decisions. It's used for various decisions such as the
> default cpu, the alignment of types, the object format, the names for
> libcalls, and a wide variety of others.
> In using it like this, LLVM assumes that triples are unambiguous and have
> a specific defined meaning. Unfortunately, this assumption fails for a
> number of reasons.
>
> The first reason is that compiler options can overrule the triple but
> leave it unchanged. For example, in GCC mips-linux-gnu-gcc normally
> produces 32-bit MIPS-I output using the O32 ABI, but 'mips-linux-gnu-gcc
> –mips64' normally produces 64-bit MIPS-III output using the N32 ABI. Like
> GCC, compiler options to mips-linux-gnu-clang should (and mostly do but
> MIPS has a few crashing cases caused by triple misuse) overrule the triple.
> However, we don't mutate the triple to reflect this so any decisions based
> on the overridable state cannot rely on the triple to accurately reflect
> the desired behaviour.
> It's worth mentioning here that some targets have hacks to partially
> mutate the triple in clang to work around issues they would otherwise have
> in the backend but this is done on an ad-hoc basis for specific details
> (e.g. mips <-> mipsel for –EL and -EB).
>
> The second reason is that there is no canonical meaning for a given GNU
> Triple, it varies between vendors and over time. There is also no
> requirement for vendors to have a unique GNU Triple for their toolchain.
> For GCC, it's fairly common for distributors to change the meanings of
> triples using options like --with-arch, --with-cpu, --with-abi, etc. There
> are also some target-specific options such as --with-mode to select
> ARM/Thumb by default and --with-nan for MIPS NAN encoding selection.
> Different vendors use different configure options and may change them at
> will. When they do change them, the vendors often desire to keep the same
> triple to be able to drop in the new version without causing wider impact
> on their environment. For example, assuming I'm reading debian/rules2 for
> Debian's gcc-4.9 package correctly then the i386-linux-gnu means i486 on
> Debian Etch and Lenny but means i586 on more recent versions. On a similar
> note, on Debian, mips-linux-gnu targets MIPS-II (optimised for typical
> MIPS32 implementations) rather than the usual MIPS-I. The last example of
> this ambiguity I'd like to reference is that mentioned by
> https://wiki.debian.org/Multiarch/Tuples#Why_not_use_GNU_triplets.3F. In
> that example, hard-float and soft-float on ARM both used arm-linux-gnueabi
> but were mutually incompatible. The Multiarch tuples described on that page
> are an attempt to resolve the ambiguity but I'm told that they aren't
> likely to be universally adopted.
>
> The third reason, is that different triples can mean the same thing. Jim
> Grosbach has mentioned that the prefixes of the GNU Triple are different
> between Linux and Darwin for ARM despite sharing the same meaning
> (presumably subject to the issues above). As a result decisions based on
> the string have to take care of multiple possible values. Mips has a
> similar issue too since a host triple (and therefore default target triple)
> of mips64-linux-gnu needs to behave like mips-linux-gnu on a 32-bit Mips
> port of Debian.
>
> Although not included in the description of the assumption above, one
> additional flaw in the use of GNU Triples is that they are sometimes
> inadequate as a description of the target. One example affecting MIPS in
> particular is that the ABI is not represented in the GNU Triple we require
> significant API changes to get this information where we need it. It would
> be helpful to be able to pass such information through the existing
> plumbing.
>
> The Planned Solution
>
> The plan is to split the GNU Triple represented by the llvm::Triple object
> into two pieces. The first piece is the existing llvm::Triple and is
> responsible for parsing the GNU triple and canonicalizing it. The second
> piece is a mutable target description named llvm::TargetTuple. TargetTuple
> is responsible for interpreting the triple according to the vendor's rules,
> providing an interface to allow mutation by tools, and authoritatively
> defining the target being targeted without the ambiguity of GNU Triples. As
> an example, 'mips-linux-gnu-clang –EL …' would:
> // Parse the GNU Triple
> llvm::Triple GnuTriple("mips-linux-gnu");
> // Convert it to a TargetTuple according to the (possibly customized)
> meanings in
> // use by the vendor.
> llvm::TargetTuple TT(GnuTriple);
> // Then mutate the TargetTuple according to the compiler options (or
> equivalent depending
> // on the tool, for example disassemblers would mutate it according to the
> object headers).
> if (hasOption("-EL"))
>   TT.setLittleEndian()
> ...
> At this point, TT would be
> "+mipsel-unknown-linux-gnu-elf32-some-other-stuff" (exact serialization is
> t.b.d and may end up target dependent) which we can then rely on in the
> rest of LLVM. This split resolves the issue of llvm::Triple objects not
> being reliable when used as a target description since TargetTuple will
> reflect the result of interpreting the triple as well as applying
> appropriate options. It also provides a suitable place for vendors to
> define the meanings of their GNU Triples.
>
> One significant detail is the way vendors customize the meaning of their
> Triples. Currently, the plan is to nominate a constructor
> (TargetTuple::TargetTuple(const Triple &)) a vendor can patch to redefine
> their triples with the default implementation being the 'usual' meaning
> (the meaning that should be used in the absence of customization). One nice
> benefit of this configure-by-source-patch approach is that vendors can
> customize multiple triples as easily as their native triple or intended
> target triple. To use Debian as an example again, they would be able to
> customize all their supported triples such that 'clang –target
> arm-linux-gnueabihf' on the amd64 port targets their armhf port using the
> same customization that makes 'clang' on the armhf port do the right thing
> natively. Android, and toolchains for heterogenous platform would likely
> benefit from this too. This configure-by-source-patch approach seems to
> make some people uncomfortable so we may have to find another way to
> configure the triples (tablegen?).
>
> To reach this result the plan is to do the following:
>
> 1.       Replace any remaining std::string's and StringRef's containing
> GNU triples with Triple objects.
>
> 2.       Split the llvm::Triple class into llvm::Triple and
> llvm::TargetTuple classes. Both are identical in implementation and almost
> identical in interface at this stage.
>
> 3.       Gradually replace Triples with TargetTuples until the C APIs and
> the LLVM-IR are the only place inside LLVM where Triples are still used.
>
> 4.       Change the implementation of TargetTuple to whatever is
> convenient for LLVM's internals and decide on a serialization.
>
> 5.       Replace serialized Triples with serialized TargetTuples in
> LLVM-IR.
>
> a.       Maintain backwards compatibility with IR using triples, at least
> for a while.
>
> 6.       Add TargetTuple support to the C API. Exact API is t.b.d.
>
> 7.       Have the API users mutate the TargetTuple appropriately.
> Renato: This has been revised slightly from the last one we discussed due
> to public C++ API's being used internally as well as externally.
>
> Where we are now
>
> I've just started posting patches for step 2 and 3 of the plan. My working
> copy is nearly at step 4.
>
> What's next
>
> Upstream step 2 and 3 and then begin replacing the TargetTuple
> implementation as per step 4.
>
> Previous Discussions
>
> http://thread.gmane.org/gmane.comp.compilers.llvm.devel/86020/focus=86073.
> I should mention that I've since been made aware that the original topic of
> private label prefixes could be solved in a much simpler way than
> previously thought. The triple related discussion is still relevant though.
> I understand from Renato that there are more threads over the last few
> years but I haven't looked for them.
>
>
> Daniel Sanders
> Leading Software Design Engineer, MIPS Processor IP
> Imagination Technologies Limited
> www.imgtec.com<http://www.imgtec.com/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150730/bfe0da42/attachment.html>