[llvm-dev] buildbot failure in LLVM on clang-native-arm-cortex-a9

Wed Aug 26 09:30:07 PDT 2015

On 08/26/2015 08:21 AM, Renato Golin via llvm-dev wrote:
> On 26 August 2015 at 15:44, Tobias Grosser <tobias at grosser.es> wrote:
>> What time-line do you have in mind for this fix? If you are in charge
>> and can make this happen within a day, giving cmake + ninja a chance seems
>> OK.
> It's not my bot. All my bots are CMake+Ninja based and are stable enough.
>
>
>> However, if the owner of the buildbot is not known or the fix can not come
>> soon, I am in favor of disabling the noise and (re)enabling it when someone
>> found time to address the problem and verify the solution.
> That's up to Galina. We haven't had any action against unstable bots
> so far, and this is not the only one. There are lots of Windows and
> sanitizer bots that break randomly and provide little information, are
> we going to disable them all? How about the perf bots that still fail
> occasionally and we haven't managed to fix the root cause, are we
> going to disable then, too?
If the bot fails regularly (say false positive rate 1 in 10 runs), then 
yes, it should be disabled until the owner fixes it.  It's perfectly 
okay for it to be put into a "known unstable" list and for the bot owner 
to report failures after they've been confirmed.

To say this differently, we will revert a *change* which is 
problematic.  Why shouldn't we "revert" a bot?
>
> You're asking to reduce considerably the quality of testing on some
> areas so that you can reduce the time spent looking at spurious
> failures. I don't agree with that in principle. There were other
> threads focusing on how to make them less spurious, more stable, less
> noisy, and some work is being done on the GreenDragon bot structure.
> But killing everything that looks suspicious now will reduce our
> ability to validate LLVM on the range of configurations that we do
> today, and that, for me, is a lot worse than a few minutes' worth of
> some engineers.
>
>
>> The cost of
>> buildbot noise is very high, both in terms of developer time spent, but
>> more importantly due to people starting to ignore them when monitoring them
>> becomes costly.
> I think you're overestimating the cost.
>
> When I get bot emails, I click on the link and if it was timeout, I
> always ignore it. If I can't make heads or tails (like the sanitizer
> ones), I ignore it temporarily, then look again next day.
I disagree strongly here.  The cost of having flaky bots is quite high.  
When I make a commit, I'm committing to be responsive to problems it 
introduces over the next few hours.  Every one of those false positives 
is a 5-10 minute high priority interruption to what I'm actually working 
on.  In practice, that greatly diminishes my effectiveness.

As an illustrative example, I submitted some documentation changes 
earlier this week and got 5 unique build failure notices.  In this case, 
I ignored them, but if that had been a small code change, that would 
have cost me at least an hour of productivity.
>
> My assumption is that the bot owner will make me aware if the reason
> is not obvious, as I do with my bots. I always wait for people to
> realise, and fix. But if they can't, either because the bot was
> already broken, or because the breakage isn't clear, I let people know
> where to search for the information in the bot itself. This is my
> responsibility as a bot owner.
First, thanks for being a responsible bot owner.  :)

If all bot owners were doing this, having a unstable list which doesn't 
actively notify would be completely workable.  If not all bot owners are 
doing this, I can't say I really care about the status of those bots.
>
> I appreciate the benefit of having green / red bots, but you also have
> to appreciate that hardware is not perfect, and they will invariably
> fail once in a while. I had some Polly bots failing randomly and it
> took me only a couple of seconds to infer so. I'm not asking to remove
> them, even those that fail more than pass throughout the year. I
> assume that, if they're still there, it provides *some* value to
> someone.
>
> cheers,
> --renato
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev