[llvm-dev] The Trouble with Triples

Tue Sep 22 06:06:00 PDT 2015

The thread has gone quiet for a few days and I need to be making progress towards a gcc-compatible toolchain (e.g. a mips-mti-linux-gnu toolchain that can target MIPS32/MIPS64 and later for all appropriate ABI's and both endians) so I need to chase this a earlier than I normally would.

> Here's the line of thought that I'd like people to start with:
> * Triples don't describe the target. They look like they should, but they don't. They're really just arbitrary strings.
> * LLVM relies on Triple as a description of the target. It defines the backend to use, the binary format to use, OS and Vendor specific quirks to enable/disable, the default CPU, the default ABI, the endian, and countless other details about the target.
> * If LLVM is built on top of an incorrect concept we should fix that but we can't abandon Triple's at the user level since every toolchain uses them.
> * But we can't keep using Triples inappropriately either. If the information feeding into LLVM is faulty then the resulting behaviour will be faulty too.
> * So let's start with a Triple, and convert it to a not-broken equivalent as early as possible. We'll call it TargetTuple.
> Are there any disagreements on this part of the thinking?
> If we have agreement on this, then I think that this by itself is ample reason for phases 1-4, and 6 of the plan.
> The justification for the IR serialization in phase 5 is simply that we need to deliver the Triple/TargetTuple to
> LTO for it to operate correctly and we currently do this by serializing Triple in the IR. If Triple has been replaced
> by TargetTuple then TargetTuple must be serializable in the IR somehow.

Are we agreed on this much? If so, I think we should go ahead with this part of the work and judge each follow-on task independently on its own merits.

From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Daniel Sanders via llvm-dev
Sent: 17 September 2015 14:21
To: Eric Christopher; Renato Golin; Jim Grosbach
Cc: llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] The Trouble with Triples

I think we need to take a step further back and re-enter from the right starting point. The thing that's bothering me about the push back so far is that it's trying to discuss and understand the consequences of resolving the core problem while seemingly ignoring the core problem itself. The reason I've been steering everything back to GNU Triple's being ambiguous and inconsistent is because it's the root of all the problems and the fixes to the various issues fall out naturally once this core point has been addressed.

Here's the line of thought that I'd like people to start with:

·         Triples don't describe the target. They look like they should, but they don't. They're really just arbitrary strings.

·         LLVM relies on Triple as a description of the target. It defines the backend to use, the binary format to use, OS and Vendor specific quirks to enable/disable, the default CPU, the default ABI, the endian, and countless other details about the target.

·         If LLVM is built on top of an incorrect concept we should fix that but we can't abandon Triple's at the user level since every toolchain uses them.

·         But we can't keep using Triples inappropriately either. If the information feeding into LLVM is faulty then the resulting behaviour will be faulty too.

·         So let's start with a Triple, and convert it to a not-broken equivalent as early as possible. We'll call it TargetTuple.
Are there any disagreements on this part of the thinking? If there are, then we should resolve these before proceeding to the rest since everything else depends on accepting this core problem exists and can be fixed in this way.
If we have agreement on this, then I think that this by itself is ample reason for phases 1-4, and 6 of the plan. The justification for the IR serialization in phase 5 is simply that we need to deliver the Triple/TargetTuple to LTO for it to operate correctly and we currently do this by serializing Triple in the IR. If Triple has been replaced by TargetTuple then TargetTuple must be serializable in the IR somehow.

Hopefully, we are agreed so far. Let's assume for the rest of this explanation that Phases 1-6 are complete and we now have const TargetTuple throughout the API. I'd like to draw particular attention to TargetMachine which, like everything else, has had its Triple member (called TargetTriple) replaced with a TargetTuple member (named TheTargetTuple). This member is used in all the same ways it used to be used when it was a Triple (named TargetTriple).

At this point, in the MC layer we have a number of classes that need to know the ABI but lack this information. Our TargetMachine has an accurate TargetTuple object that describes the invariants of the desired target. The desired ABI is an invariant too so why not have it in the TargetTuple which is already plumbed in everywhere we need it? After all, it's a property of the target OS/Environment. If we have the ABI in the TargetTuple, then we don't need any other means to set the ABI, tools can set it up front in the TargetTuple and we don't need any command-line option handling for it in the backend.

Meanwhile, in clang we have a number of command line options that change the desired target. Let's say we've constructed a Triple and resolved it to TargetTuple (more on that below). We're now processing the –EL option. At the moment, we substitute our mips-linux-gnu triple for a mipsel-linux-gnu triple, construct a Triple object from it and resolve the new Triple to a TargetTuple. But why do we need to bother with that kind of weird hackery when we can simply do Obj.setEndian(Little)? This is what Phase 7 of the plan is about. We end up with a cleaner way to process target changes that, until now, have required weird triple hacking to handle.

I skipped the Triple -> TargetTuple resolution a moment ago and I should address that now. We already know that mapping Triple to TargetTuple is a many to many mapping. One Triple has many possible TargetTuple's depending on the environment. One TargetTuple can be formed from multiple possible Triples. In an ideal world, we'd like to bake in all of these mappings so that one clang binary supports everything. Unfortunately, being a many to many mapping, some of these mappings are mutually exclusive. Note that this isn't a new problem resulting from this project. The problem has always been there but has been ignored until now. To resolve this, we need to provide configure-time and possibly run-time controls for how this conversion is disambiguated. This resolution is performed as early as possible so that the middle/back-ends don't need to know anything about the ambiguity problem.

---

To reply more directly to your email:
> What can't be done to TargetMachine to avoid this serialization?

TargetMachine already has the serialization (see TargetMachine::TargetTriple). We're not doing anything new here. We're simply replacing one object holding faulty information with a new object holding reliable information.

> And a followup question: What can't be serialized at the function level in the IR to make certain things clear that aren't global? We already do this for a lot of command line options.

The data I want to fix is global. I think the bit you may be getting hung up on here is that small portions of this global data can also be overridden at the function level. Those overrides aren't a problem and continue to operate in the same way as they do today.

> And one more: What global options do we need to consider here?

I'm not certain I understand this question. If you're talking command line options, it's things like –EL, -EB, -mips32, -mips32r[2356], -mips64, -mips64r[2356], -mabi=…. If you're talking about Triple -> TargetTuple mappings, there's quite a wide variety but the main ones for Mips are endian, architecture, default CPU, and default ABI.

> The goal of the configuration level of the TargetMachine is that it controls things that don't change at the object level.
> This is a fairly recently stated goal, but I think it makes sense for LLVM in general. TargetSubtargetInfo takes care of
> everything that resides under this (as much as possible, some bits are still in transition, e.g. TargetOptions). This is part
> of my suggestion to Daniel about the problems with MCSubtargetInfo and the assembler. Targets like Mips and ARM
> were unfortunately designed to change things on the fly during assembly and need to collate or at least change defaults
> as we're processing code. I definitely had to deal with a lot of the pain you're talking about when I was rewriting some
> of the handling there during the TargetSubtargetInfo work.

I generally agree with this. The key bit I need to draw attention to is that the 'defaults' don't change, but are instead overridden. These constant defaults are stored in TargetMachine and particularly TargetMachine::TargetTriple. These defaults are wrong for some toolchains since the information stored in TargetMachine::TargetTriple are wrong. It's the defaults I'm trying to fix rather than the overrides.

I think I understand your proposed plan now and it's a few steps ahead of where we are and where we need to be. I agree that overridable state should be in TargetSubtargetInfo, however I can't initialize that state without the default values which come from the faulty information in TargetMachine::TargetTriple. This triple work is a pre-requisite to your plan and at first I don't need to override ABI's.

> Right now I see TargetTuple as trying to take over all of the various arguments to TargetMachine and encapsulate them into a single thing.
> I also don't see this is bad, but I also don't see it taking all of them right now and I'm not sure how it solves some of the existing problems
> with data sharing that we've got which is where the push back you're both getting is coming from here. Ultimately library-wise I can agree
> with some of the directions you're headed - I just don't see the unification and interactions right now.

I think we'll end up with TargetTuple taking over many arguments to TargetMachine but that's not my goal at this stage. My goal is simply to fix the faulty information currently held in Triple and use the now-accurate information in TargetTuple to fix various blocking issues that prevent a proper Mips toolchain product based on Clang/LLVM. At the end of Phase 7, it become possible to fix a number of issues that are impossible to fix right now because the available data we can consult at the moment is incorrect.

From: Eric Christopher [mailto:echristo at gmail.com]
Sent: 16 September 2015 23:52
To: Renato Golin; Jim Grosbach
Cc: Daniel Sanders; llvm-dev at lists.llvm.org
Subject: Re: The Trouble with Triples

Let's take a step back here.

It appears that you and Daniel are trying to solve some problems. I think solving problems is good, I just want to make sure that we're solving them in a way that gets us a decent API at the end. I also want to make sure we're solving the right problems.

TargetTuple appears to be related to the TargetParser as you bring up in this mail. They're two separate parts of similar problems - people trying to both serialize command line options and communication from the front end to the backend with respect to target information.

This leads me to a question: What can't be done to TargetMachine to avoid this serialization?
And a followup question: What can't be serialized at the function level in the IR to make certain things clear that aren't global? We already do this for a lot of command line options.
And one more: What global options do we need to consider here?

The goal of the configuration level of the TargetMachine is that it controls things that don't change at the object level. This is a fairly recently stated goal, but I think it makes sense for LLVM in general. TargetSubtargetInfo takes care of everything that resides under this (as much as possible, some bits are still in transition, e.g. TargetOptions). This is part of my suggestion to Daniel about the problems with MCSubtargetInfo and the assembler. Targets like Mips and ARM were unfortunately designed to change things on the fly during assembly and need to collate or at least change defaults as we're processing code. I definitely had to deal with a lot of the pain you're talking about when I was rewriting some of the handling there during the TargetSubtargetInfo work.

Now a bit more on TargetParser + TargetTuple:

TargetParser appears to be trying to solve the parsing in Triple in a nice way for ARM and also some of the "what kind of subtarget feature canonicalization can we do in llvm that makes sense to communicate to the front end". I like this particular idea and have often wanted a library of feature handling, but it seems to have stabilized at an ARM specific set of code with no defined interface. I can't even figure out how I'd use it in lib/Basic right now for any target other than ARM. This isn't a condemnation of TargetParser, but I think it's something that needs to be thought through a bit more. It's been hooked up well before I'd expected it to and right now if we moved it to the ARM backend from Support it'd make just as much sense as it does where it is now other than making clang depend on the ARM backend as well as the X86 backend :)

Right now I see TargetTuple as trying to take over all of the various arguments to TargetMachine and encapsulate them into a single thing. I also don't see this is bad, but I also don't see it taking all of them right now and I'm not sure how it solves some of the existing problems with data sharing that we've got which is where the push back you're both getting is coming from here. Ultimately library-wise I can agree with some of the directions you're headed - I just don't see the unification and interactions right now.

As a suggestion as a way forward here let's see if we can get my questions above answered and also show some of how the interactions between llvm's libraries are going to get fixed, moved to a better place, etc here.

Thanks!

-eric

On Wed, Sep 16, 2015 at 3:02 PM Renato Golin <renato.golin at linaro.org<mailto:renato.golin at linaro.org>> wrote:
On 16 September 2015 at 21:56, Jim Grosbach <grosbach at apple.com<mailto:grosbach at apple.com>> wrote:
> Why do we care about GAS? We have an assembler.

It's not that simple.

There are a lot of old code out there, including the Linux kernel
which we do care a lot, that only compiles with GAS. We're slowly
moving the legacy code up to modern standards, and specifically some
kernel folks are happy to move up not only the asm syntax, but the C
standard and move away from GNU-specific behaviour. But we're not
quite there yet, and might not be for a few more years. so, yes, we
still care about GAS.

But this is not just about GAS.

As I said on my previous email, this is about clearing the bloat in
target descriptions by both: removing the need for adding numerous CPU
names, target features, architecture names (xscale, strongarm, etc),
AND making sure all parties (front/middle/back-ends) speak the same
language, produced from the same source.

The TargetTuple is that common language, and the TargetParser created
from the TableGen files is the common source. The Triple becomes a
legacy constructor value for the Tuple. All other target information
classes are already (or should be) generated from the TableGen files,
so the ultimate source becomes the TableGen description, which I think
it what you were aiming to on your comment.

For simple architectures, like x86, you don't even need a
TargetParser. You can easily construct the Tuple from a triple and use
the Tuple as you've always used the triple. No harm done. But for the
complex ones like ARM and MIPS, having a common interface generated
from the same place the other interfaces are is important to avoid
more bridges between front and middle and back end interpretations of
the same target. Whatever legacy ARM or MIPS carry can be isolated in
their own implementation, leaving the rest of the targets with a clean
and simple interface.

cheers,
--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150922/d09dcaf4/attachment-0001.html>