[llvm-dev] RFC: Building GlobalISel by default

Sun Jan 15 09:08:25 PST 2017

> On Jan 15, 2017, at 5:14 AM, Renato Golin <renato.golin at linaro.org> wrote:
> 
> On 15 January 2017 at 04:24, Michael Kuperstein
> <michael.kuperstein at gmail.com> wrote:
>> Regarding "we shouldn't enable it because it will make the bots slower" -
>> well, yes, but that's just postponing the inevitable. We will enable
>> GlobalISel eventually, and there will probably be a very long time-frame
>> during which both are enabled concurrently.
> 
> No magic solutions expected. I just don't want to crank all bots up to
> 11 on day one. We start with the fast bots, when they're happy, we
> move on until we're confident that the whole thing is stable, then we
> make it build by default.
> 
> There are two problems here:
> 
> 1. Environmental issues.
> 
> We do find more of those than we'd hope. Different compilers, linkers,
> libraries that produce not only crashes, but unstable builds and
> unpredictable or irreproducible test failures. The environment on our
> bots are wildly different (on purpose) and dealing with different
> unpredictable issues all at once is something that I'm not willing to
> take on lightly. Due to the nature of buildbots and some of our
> hardware and the fact that virtually all ARM/AArch64 bots are ours,
> this will be something that Linaro will pay the price in full.
> 
> 2. Experimental nature.
> 
> If the past serves as a guide, experimental features at the core of
> LLVM will be largely independent until they start to merge, when they
> can become a huge issue and be in merging state for months, if not
> years, or worse, never be merged. Right now, we're not at that state,
> we're still largely independent, so your worries regarding the
> development tests (which also hit buildbots at a different pace
> because of the environment issues above), will be lower for now, but
> will increase massively soon enough.
> 
> There is the argument that having the bots building means we'll have
> *less* teething issues and the people building GlobalISel (which also
> includes Linaro) will have a lot less work. But there's a cost to be
> paid, and that cost is a lot higher on the ARM/AArch64 side than x86,
> just because most developers use an x86 as their main machine.
> 
> 
>> Still, as a strawman proposal - would it make sense to add
>> LLVM_BUILD_GLOBAL_ISEL_EXPERIMENTAL, or something of that sort? The more
>> mature ports will build under LLVM_BUILD_GLOBAL_ISEL (which will be enabled
>> by default), the less mature ones only under
>> LLVM_BUILD_GLOBAL_ISEL_EXPERIMENTAL (which won't be), and moving a port from
>> GLOBAL_ISEL_EXPERIMENTAL to GLOBAL_ISEL would be equivalent to marking a
>> regular backend non-experimental. We can start with just AArch64 in the
>> non-experimental category, and move the others as they become more stable.
>> Or is this too much complexity for what we gain?
> 
> It is.
> 
> We already separate that by not building any other back-end than ARM
> and AArch64. So I think we can do that without the added complexity.
> 
> I'm not trying to block the move, I'm just trying to make it smother
> and raise awareness that the cost we pay for any large move is not
> well distributed.
> 
> For the past few months I have reverted patches that only broke the
> ARM bots and it was clear on most of those instances which patch was
> to blame, but people didn't bother and after 8, 12 sometimes 24 hours,
> I reverted. This is not a nice place to be in, and given that
> resurgence lately, I am honestly afraid adding more things will make
> it worse.
> 
> A red bot can't catch new errors. A slow red bot can go on red for
> weeks (and we have seen it numerous times) because once you fix bug
> #1, #2 has made #3 appear and not be warned.
> 
> This is not everyone, and I do appreciate all the people that helped
> us and worried about the breakages, but it's a trend that is
> increasing, not decreasing. Our team is too small to cope with further
> increases, so if the community is willing to do its part, I think we
> can make it work.
> 
> But it has to be slow and steady.

It seems to me that your issue is your bots are “slow”, and clang/LLVM is a large project, and running the tests takes time.
I personally don’t understand how anything you’re saying is specific to GlobalISel though, it may deserve a separate thread, and we should focus on the real issue at stand, the *only* relevant question I see in this thread:  "if my (non-GlobalISel) change breaks GlobalISel, how likely is it to be a bug in my code vs. a bug in GlobalISel?”.

— 
Mehdi