[llvm-dev] RFC: Building GlobalISel by default

Sun Jan 15 05:14:52 PST 2017

On 15 January 2017 at 04:24, Michael Kuperstein
<michael.kuperstein at gmail.com> wrote:
> Regarding "we shouldn't enable it because it will make the bots slower" -
> well, yes, but that's just postponing the inevitable. We will enable
> GlobalISel eventually, and there will probably be a very long time-frame
> during which both are enabled concurrently.

No magic solutions expected. I just don't want to crank all bots up to
11 on day one. We start with the fast bots, when they're happy, we
move on until we're confident that the whole thing is stable, then we
make it build by default.

There are two problems here:

1. Environmental issues.

We do find more of those than we'd hope. Different compilers, linkers,
libraries that produce not only crashes, but unstable builds and
unpredictable or irreproducible test failures. The environment on our
bots are wildly different (on purpose) and dealing with different
unpredictable issues all at once is something that I'm not willing to
take on lightly. Due to the nature of buildbots and some of our
hardware and the fact that virtually all ARM/AArch64 bots are ours,
this will be something that Linaro will pay the price in full.

2. Experimental nature.

If the past serves as a guide, experimental features at the core of
LLVM will be largely independent until they start to merge, when they
can become a huge issue and be in merging state for months, if not
years, or worse, never be merged. Right now, we're not at that state,
we're still largely independent, so your worries regarding the
development tests (which also hit buildbots at a different pace
because of the environment issues above), will be lower for now, but
will increase massively soon enough.

There is the argument that having the bots building means we'll have
*less* teething issues and the people building GlobalISel (which also
includes Linaro) will have a lot less work. But there's a cost to be
paid, and that cost is a lot higher on the ARM/AArch64 side than x86,
just because most developers use an x86 as their main machine.

> Still, as a strawman proposal - would it make sense to add
> LLVM_BUILD_GLOBAL_ISEL_EXPERIMENTAL, or something of that sort? The more
> mature ports will build under LLVM_BUILD_GLOBAL_ISEL (which will be enabled
> by default), the less mature ones only under
> LLVM_BUILD_GLOBAL_ISEL_EXPERIMENTAL (which won't be), and moving a port from
> GLOBAL_ISEL_EXPERIMENTAL to GLOBAL_ISEL would be equivalent to marking a
> regular backend non-experimental. We can start with just AArch64 in the
> non-experimental category, and move the others as they become more stable.
> Or is this too much complexity for what we gain?

It is.

We already separate that by not building any other back-end than ARM
and AArch64. So I think we can do that without the added complexity.

I'm not trying to block the move, I'm just trying to make it smother
and raise awareness that the cost we pay for any large move is not
well distributed.

For the past few months I have reverted patches that only broke the
ARM bots and it was clear on most of those instances which patch was
to blame, but people didn't bother and after 8, 12 sometimes 24 hours,
I reverted. This is not a nice place to be in, and given that
resurgence lately, I am honestly afraid adding more things will make
it worse.

A red bot can't catch new errors. A slow red bot can go on red for
weeks (and we have seen it numerous times) because once you fix bug
#1, #2 has made #3 appear and not be warned.

This is not everyone, and I do appreciate all the people that helped
us and worried about the breakages, but it's a trend that is
increasing, not decreasing. Our team is too small to cope with further
increases, so if the community is willing to do its part, I think we
can make it work.

But it has to be slow and steady.

cheers,
--renato