[llvm-dev] buildbot failure in LLVM on clang-native-arm-cortex-a9

Wed Aug 26 08:21:06 PDT 2015

On 26 August 2015 at 15:44, Tobias Grosser <tobias at grosser.es> wrote:
> What time-line do you have in mind for this fix? If you are in charge
> and can make this happen within a day, giving cmake + ninja a chance seems
> OK.

It's not my bot. All my bots are CMake+Ninja based and are stable enough.

> However, if the owner of the buildbot is not known or the fix can not come
> soon, I am in favor of disabling the noise and (re)enabling it when someone
> found time to address the problem and verify the solution.

That's up to Galina. We haven't had any action against unstable bots
so far, and this is not the only one. There are lots of Windows and
sanitizer bots that break randomly and provide little information, are
we going to disable them all? How about the perf bots that still fail
occasionally and we haven't managed to fix the root cause, are we
going to disable then, too?

You're asking to reduce considerably the quality of testing on some
areas so that you can reduce the time spent looking at spurious
failures. I don't agree with that in principle. There were other
threads focusing on how to make them less spurious, more stable, less
noisy, and some work is being done on the GreenDragon bot structure.
But killing everything that looks suspicious now will reduce our
ability to validate LLVM on the range of configurations that we do
today, and that, for me, is a lot worse than a few minutes' worth of
some engineers.

> The cost of
> buildbot noise is very high, both in terms of developer time spent, but
> more importantly due to people starting to ignore them when monitoring them
> becomes costly.

I think you're overestimating the cost.

When I get bot emails, I click on the link and if it was timeout, I
always ignore it. If I can't make heads or tails (like the sanitizer
ones), I ignore it temporarily, then look again next day.

My assumption is that the bot owner will make me aware if the reason
is not obvious, as I do with my bots. I always wait for people to
realise, and fix. But if they can't, either because the bot was
already broken, or because the breakage isn't clear, I let people know
where to search for the information in the bot itself. This is my
responsibility as a bot owner.

I appreciate the benefit of having green / red bots, but you also have
to appreciate that hardware is not perfect, and they will invariably
fail once in a while. I had some Polly bots failing randomly and it
took me only a couple of seconds to infer so. I'm not asking to remove
them, even those that fail more than pass throughout the year. I
assume that, if they're still there, it provides *some* value to
someone.

cheers,
--renato