[llvm-dev] Buildbot Noise

Wed Oct 7 03:10:03 PDT 2015

Hi David,

I think we're repeating ourselves here, so I'll reduce to the bare
minimum before replying.

On 6 October 2015 at 21:40, David Blaikie <dblaikie at gmail.com> wrote:
> When I suggest someone disable notifications from a bot it's because those
> notifications aren't actionable to those receiving them.

This is a very limited view of the utility of buildbots.

I think part of the problem is that you're expecting to get instant
value out of something that cannot provide that to you. If you can't
extract value from it, it's worthless.

Also, it seems, you're associating community buildbots with company
testing infrastructure. When I worked at big companies, there were
validation teams that would test my stuff and deal with *any* noise on
their own, and only the real signal would come to me: 100% actionable.
However, most of the bot owners in open source communities do this as
a secondary task. This has always been the case and until someone
(LLVM Foundation?) starts investing in a better infrastructure overall
(multi master, new slaves, admins), there isn't much we can do to
improve that quick enough.

The alternative is that the less common architectures will always have
noisier bots because less people use them day-to-day, during their
development time. By having a hard line on those, in the long run,
means we'll disable most testing on all secondary architectures, and
LLVM becomes an Intel compiler. But many companies use LLVM for their
production compiler on their own targets, so the inevitable is that
they will *fork* LLVM. I don't think anyone wants that.

> I'm not suggesting removing the testing. Merely placing the onus on
> responding to/investigating notifications on the parties with the context to
> do so.

You still don't get the point. This would make sense on a world where
all parties are equal.

Most people develop and test on x86, even ARM and MIPS engineers. That
means x86 is almost always stable, no matter who's working.

But some bugs that we had to fix this year show up randomly *only* on
ARM. That was a serious misuse of the Itanium C++ ABI, and one that
took a long time to be fixed, and we still don't know if we got them
all.

Bugs like that normally only show up on self-hosting builds, sometimes
on self-hosted Clang compiled test-suite. These bugs have no hard
good/bad line for bisecting, they take hours per cycle, and they may
or may not fail, so automated bisecting won't work. Furthermore, there
is nothing to XFAIL in this case, unless you want to disable building
Clang, which I don't think you do.

While it's taking days, if not weeks, to investigate this bot, the
status may be going from red to green to red. It would be very
simplistic to assume that *any* greed->red transition while I'm
bisecting the problem will be due to the current known instability. It
could be anything, and developers still need to be warned if the alarm
goes off.

The result may be it's still flaky, the developer can't do much, life
goes on. Or it could be his test, he fixes immediately, and I'm
eternally grateful, because I still need to investigate *only one* bug
at a time. By silencing the bot, I'd have to be responsible for
debugging the original hard problem plus any other that would come
while the bot was flaky.

Now, there's the issue of where does the responsibility lies...

I'm responsible for the quality of the ARM code, including the
buildbots. What you're suggesting is that *no matter what* gets
committed, it is *my* responsibility to fix any bug that the original
developers can't *action upon*.

That might seem sensible at first, but the biggest problem here is the
term that you're using over and over again: *acting upon*. It can be a
technical limitation that you can't act upon a bug on an ARM bot, but
it can also be a personal one. I'm not saying *you* would do that, but
we have plenty of people in the community with plenty of their own
problems. You said it yourself, people tend to ignore problems that
they can't understand, but not understanding is *not* the same as not
being able to *act upon*.

For me, that attitude is what's at the core of the problem here. By
raising the bar faster than we can make it better, you're essentially
just giving people the right not to care. The bar will be raised even
further by peer pressure, and that's the kind of behaviour that leads
to a fork. I'm trying to avoid this at all costs.

> All I'd expect is that you/others watch the negative
> bot results, and forward any on that look like actionable true positives. If
> that's too expensive, then I don't know how you can expect community members
> to incur that cost instead of bot owners?

Another example of the assumption that bot owners are validation
engineers and that's their only job. It was never like this in LLVM
and it won't start today just because we want to.

My expectation of the LLVM Foundation is that they would take our
validation infrastructure to the next level, but so far I haven't seen
much happening. If you want to make it better, instead of forcing your
way on the existing scenario, why not work with the Foundation to move
this to the next level?

> Once people lose
> confidence in the bots, they're not likely to /gain/ confidence again -

That's not true. Galina's Panda bots were unstable in 2010, people
lost confidence, she added more boards, people re-gained confidence in
2011. Then it became unstable in 2013, people lost confidence, we
fixed the issues, people re-gained confidence only a few months later.
This year it got unstable again, but because we already have enough
ARM bots elsewhere, she disabled them for good.

You're exaggerating the effects of unstable bots as if people expected
them to be always perfect. I'd love if they could be, but I don't
expect them to be.

> I'm looking at the existing behavior of the community - if people are
> generally ignoring the result of a bot anyway (& if it's red for weeks at a
> time, I think they are) then the notifications are providing no value.

I'm not seeing that, myself. So far, you're the only one that is
shouting out loud that this or that bot is noisy.

Sometimes people ignore bots, but I don't take this as a sign that
everything is doomed, just that people focus on different things at
different times.

>> No user is building trunk every commit (ish). Buildbots are not meant
>> to be as stable as a user (including distros) would require.
>
> I disagree with this - I think it's a worthy goal to have continuous
> validation that is more robust and comprehensive.

A worthy goal, yes. Doable right now, with the resources that we have,
no. And no amount of shouting will get this done.

If we want quality, we need top-level management, preferably from the
LLVM Foundation, and a bunch of dedicated people working on it, which
could be either funded by the foundation or agreed between the
interested parties. If anyone ever get this conversation going (I
tried), please let me know, as I'm very interested in making that
happen.

> red->exception->red I don't mind too much - the "timeout->timeout" example
> you gave is one I disagree with.

Ah, yes. I mixed them up.

>> I agree in principle. I just worry that it's a lot easier to add an
>> XFAIL than to remove it later.
>
> How so? If you're actively investigating the issue, and everyone else is
> happily ignoring the bot result (& so won't care when it goes green, or red
> again) - you're owning the issue to get your bot back to green, and it just
> means you have to un-XFAIL it as soon as that happens.

>From my experience, companies put people to work on open source
projects when they need something done and don't want to bear the
costs of maintaining it later.

So, initially, developers have a high pressure of pushing their
patches through, and you see them very excited in addressing the
review comments, adding tests, fixing bugs.

But once the patch is in, the priority of that task, for that company,
is greatly reduced. Most developers consider investigating an XFAIL
from their commit as important as the commit itself, but not
necessarily most companies do so with the same passion.

Moreover, once developers implement whatever they needed here, it's
not uncommon for their parent companies to move them away from the
project, in which case they can't even contribute any more due to
license issues, etc.

But we also have the not-so-responsible developers, that could create
a bug, assign to themselves, and never look back unless someone
complains.

That's why, at Linaro, I have the policy to only mark XFAIL when I can
guarantee that it's either not supposed to work or the developer will
fix it *before* marking the task closed.

cheers,
--rento