[llvm-dev] buildbot failure in LLVM on clang-native-arm-cortex-a9
Philip Reames via llvm-dev
llvm-dev at lists.llvm.org
Wed Aug 26 09:30:07 PDT 2015
On 08/26/2015 08:21 AM, Renato Golin via llvm-dev wrote:
> On 26 August 2015 at 15:44, Tobias Grosser <tobias at grosser.es> wrote:
>> What time-line do you have in mind for this fix? If you are in charge
>> and can make this happen within a day, giving cmake + ninja a chance seems
> It's not my bot. All my bots are CMake+Ninja based and are stable enough.
>> However, if the owner of the buildbot is not known or the fix can not come
>> soon, I am in favor of disabling the noise and (re)enabling it when someone
>> found time to address the problem and verify the solution.
> That's up to Galina. We haven't had any action against unstable bots
> so far, and this is not the only one. There are lots of Windows and
> sanitizer bots that break randomly and provide little information, are
> we going to disable them all? How about the perf bots that still fail
> occasionally and we haven't managed to fix the root cause, are we
> going to disable then, too?
If the bot fails regularly (say false positive rate 1 in 10 runs), then
yes, it should be disabled until the owner fixes it. It's perfectly
okay for it to be put into a "known unstable" list and for the bot owner
to report failures after they've been confirmed.
To say this differently, we will revert a *change* which is
problematic. Why shouldn't we "revert" a bot?
> You're asking to reduce considerably the quality of testing on some
> areas so that you can reduce the time spent looking at spurious
> failures. I don't agree with that in principle. There were other
> threads focusing on how to make them less spurious, more stable, less
> noisy, and some work is being done on the GreenDragon bot structure.
> But killing everything that looks suspicious now will reduce our
> ability to validate LLVM on the range of configurations that we do
> today, and that, for me, is a lot worse than a few minutes' worth of
> some engineers.
>> The cost of
>> buildbot noise is very high, both in terms of developer time spent, but
>> more importantly due to people starting to ignore them when monitoring them
>> becomes costly.
> I think you're overestimating the cost.
> When I get bot emails, I click on the link and if it was timeout, I
> always ignore it. If I can't make heads or tails (like the sanitizer
> ones), I ignore it temporarily, then look again next day.
I disagree strongly here. The cost of having flaky bots is quite high.
When I make a commit, I'm committing to be responsive to problems it
introduces over the next few hours. Every one of those false positives
is a 5-10 minute high priority interruption to what I'm actually working
on. In practice, that greatly diminishes my effectiveness.
As an illustrative example, I submitted some documentation changes
earlier this week and got 5 unique build failure notices. In this case,
I ignored them, but if that had been a small code change, that would
have cost me at least an hour of productivity.
> My assumption is that the bot owner will make me aware if the reason
> is not obvious, as I do with my bots. I always wait for people to
> realise, and fix. But if they can't, either because the bot was
> already broken, or because the breakage isn't clear, I let people know
> where to search for the information in the bot itself. This is my
> responsibility as a bot owner.
First, thanks for being a responsible bot owner. :)
If all bot owners were doing this, having a unstable list which doesn't
actively notify would be completely workable. If not all bot owners are
doing this, I can't say I really care about the status of those bots.
> I appreciate the benefit of having green / red bots, but you also have
> to appreciate that hardware is not perfect, and they will invariably
> fail once in a while. I had some Polly bots failing randomly and it
> took me only a couple of seconds to infer so. I'm not asking to remove
> them, even those that fail more than pass throughout the year. I
> assume that, if they're still there, it provides *some* value to
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
More information about the llvm-dev