[cfe-dev] [llvm-dev] Buildbot Noise

Wed Oct 7 14:14:50 PDT 2015

As a foreword: I haven't read a lot of the thread here and it's just a
single developer talking here :)

On Wed, Oct 7, 2015 at 7:45 AM Renato Golin via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> On 7 October 2015 at 15:39, James Y Knight <jyknight at google.com> wrote:
> > But since nobody actually seemed interested in fixing it, I didn't keep
> making noise about it. I basically just ignore the failure notices from
> buildbot, because every commit seems to trigger multiple bogus failure
> notices, no matter what.
>
> That's not true, either.
>
> We (buildbot owners and admins) are constantly improving the noise by
> adding more boards, investigating stability issues and disabling bots
> temporarily when they're too noisy. We may not do it at the speed some
> people expect, or to the extent that a fully supported validation team
> in a big company would, but we do the best we can.
>
>
Any bot that people just ignore because it's usually flaky isn't worth
having around. It's basically just making people not pay attention to the
ones that are reliable.

>
> > I don't know what the solution is, but it's got to somehow move towards
> trying to avoid blaming committers for already-known problems, or for
> infrastructure issues (e.g.: svn update failed? Why do I care?). It simply
> does not help to improve the quality of LLVM to have the buildbots send
> emails to committers of arbitrary patches when a bot that "everyone"
> already knows is flaky has failed yet again. *I* don't know which bots are
> "supposed" to be flaky, so if I actually bothered to fully investigate
> every such notice, that'd just be a massive waste of effort.
>
> The alternative is worse: not testing.
>

Absolutely agreed. That said, a flaky bot should only go to the people that
care about it until it's considered stable for general use. I.e. if I can
rely that I (or someone in the recent commit history) broke a target when
the bot tells us then the bot is useful.

Things that are still ok for people to pay attention to on occasion:

Timeouts
svn update failed

these are fairly rare (or should be) such that the noise caused by them
isn't too bad and falls under the "a few false positives are better than a
false negative here".

Solution to a great deal of the failures on the slow bots:

It would be nice if we could get phased builders so that the fast builders
could run native tests on, say, linux, darwin, windows quickly and then
after that the more board based testers can run and make sure that they
don't fail so often due to "transient" failures that happen on a regular
basis that get fixed in a quick followup commit.

I've cc'd Chris Matthews who was originally working on getting the phased
builders out in a useable fashion for the general community.

Anything that wouldn't be solved by the above as far as stability for the
builders that Dave (and others) are complaining about should probably have
them as private bots until they get fixed, otherwise they're not providing
enough signal for the noise.

-eric

>
> The assumption is wrong: people *do* care, but the problem is harder
> than it looks, and needs more than just the bot owner to improve.
>
> I wish I had a magic wand... but I'm not expecting to ever have one.
>
> cheers,
> --renato
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20151007/67c5213a/attachment.html>