[cfe-dev] First shot at Bug 4127 - The clang driver should support cross compilation

Tue Jan 10 02:00:47 PST 2012

> It would be really nice to have smaller patches rather than larger patches, and earlier discussion of them.
>
> Again, I remain very concerned about doing lots of work around configuration files to configure a *broken* driver design. I think we'll just end up > with broken config file designs as well, and we'll simultaneously make it that much harder to refactor and change the driver in the future.
>
> I am still pushing to see refactoring and design work on the *existing* use cases the driver supports before extending the use cases. I don't know how to support cross compilation for more and more diverse platforms prior to getting cross compilation for very basic platforms, or even non-cross compilation into a better state.

> Consider that system header search logic for Darwin, MinGW, and Cygwin is still largely implemented in the Frontend rather than the Driver. This is something I'm actively working on for reference...

Indeed, and this is why it has taken so long since the conference to get something solid (apart from other work). Not only do we need to work out an end goal but make it achievable in small reviewable steps.

I've studied the internals of the driver and thought of several different ways of factoring it, including a composable "pass" framework where arguments get successively modified by composable passes. That had some promise, as the baked in behaviour could be completely controlled at runtime easily, but was a no go because, and this is the major part:

I can't find a way of validating that a gigantic refactor makes no functional change in the driver. The regression tests aren't sufficient, and I'm likely to break Darwin or some other target with a huge refactor - Tools.cpp for example contains 5200 lines, some of which are common and others not.

Add to that the fact that after many iterations I still come back to the current driver design as "not broken". There's nothing wrong with the concept of Tools and ToolChains - in fact as an abstraction they suit reality well.

The main thing I see being the problem is the use of subclassing to parameterise the Tool classes. Because they weren't designed for parameterisation to start with, people have also copypasta'd huge chunks of code around. There are at least 5 different functions that can driver "ld" or "as", for example, each subtly different because one or two have had bugfixes, some have trashed behaviour they don't support, etc etc.

So here's my general "vision":

 * A subclass of Tool will relate solely to the command it is driving/producing, not OS/Arch specific configuration thereof. For example, "binutils::As", "binutils::Ld", "gcc::Compile", "gcc::Link", "gcc::Assemble", "visualstudio::Link".
   * These tools will have a parameter "std::vector<std::string> ExtraArgs", which is a list of extra arguments to give to the tool. This will be created elsewhere.
   * I have yet to work out where Darwin will fit here - ideally I'd like to have Darwin do all its funky logic and stick it all in ExtraArgs then go independent from there, but I don't know the best solution.

 * A target should be able to select any tool for any JobAction. This makes hard-baked ToolChains superfluous. You shouldn't have to subclass ToolChain for your target, because it will be dynamically generated by...

 * The "target database". I think this should be able to parameterise the Tools in any way required - all OS-specific stuff (With the exception of Darwin - that probably requires too much imperative code) should be in the DB.
  * This can take two forms - hard-baked and JSON. The hard-baked version I see being a tablegen file similar (as possible) to the JSON representation, which is compiled into Clang for speed.
  * This way, we keep the speed and extensibility and channel them both through the same interface, so that anything you can do hard-coded you can also change at runtime.

So here's my migration plan:

  1. The target database is where all the current imperative configuration should be factored out to. Create an initial draft schema, a ToolChain/HostInfo that uses it. At this point I suggest only using JSON as this will be easier to change should the schema change than a tablegen backend. The tablegen backend can be added later and the JSON data ported over for speed.
  2. Create the first of the "properly independent" Tools - binutils::Ld and binutils::As, and use the target database to parameterise them.
    * Probably first patch checkin point? Use a new driver debug flag to enable the new behaviour -ccc-dynamic-driver.
  3. Port more ToolChains to the target database. For linux, we'd need to keep the distro detection logic outside the targetdb, but then we shouldn't need clever header detection methods as we can bake the expected header locations for a given distro into the target database.
  4. Sort out what we're doing with Darwin. Is it having its own set of Tools and living in its own domain, or is it linked to the independent tools?
  5. The Big Switchover, at which point we can remove ideally around 4000 lines in Tools.cpp and 90% of ToolChains.cpp (probably also HostInfo.cpp) and end up with a driver which is centrally configurable both at compile and runtime.

OK, so there's a full braindump. I was going to throw this up for discussion in true LLVM style - "with a patch" - in a week or so but my hand has been pushed ;)

Note that this doesn't address the parsing logic disparity between Driver and Frontend - that's not my aim. I'm hoping to "fix the driver for cross-compilation", not fix the entire driver. I'm hoping someone else might chip in there!

Let the heckling commence! ;)

Cheers,

James

From: Chandler Carruth [mailto:chandlerc at google.com]
Sent: 10 January 2012 09:24
To: James Molloy
Cc: Sebastian Pop; cfe-dev at cs.uiuc.edu Developers; clang-commits at cs.uiuc.edu
Subject: Re: [cfe-dev] First shot at Bug 4127 - The clang driver should support cross compilation

On Tue, Jan 10, 2012 at 12:25 AM, James Molloy <James.Molloy at arm.com> wrote:
As I say, I'm working on a patch that I think is a superset of yours and would conflict massively. I've been planning it for some time and think I have a viable end goal and route to get there.

It would be really nice to have smaller patches rather than larger patches, and earlier discussion of them.

Again, I remain very concerned about doing lots of work around configuration files to configure a *broken* driver design. I think we'll just end up with broken config file designs as well, and we'll simultaneously make it that much harder to refactor and change the driver in the future.

I am still pushing to see refactoring and design work on the *existing* use cases the driver supports before extending the use cases. I don't know how to support cross compilation for more and more diverse platforms prior to getting cross compilation for very basic platforms, or even non-cross compilation into a better state.

Consider that system header search logic for Darwin, MinGW, and Cygwin is still largely implemented in the Frontend rather than the Driver. This is something I'm actively working on for reference...

-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium.  Thank you.