[llvm-dev] Buildbot Noise

Renato Golin via llvm-dev llvm-dev at lists.llvm.org
Mon Oct 19 12:26:57 PDT 2015


Huge inline record again... I'll pick the contentious issues...

On 19 October 2015 at 19:38, David Blaikie <dblaikie at gmail.com> wrote:
> at all, but /specifically/ about long-red bots that appear neglected.

"appear" is the key here. It'd be better if you ask first, then
propose to disable later. If I was on holidays, someone (maybe you)
could have assumed lack of care and disabled them without the ARM
sub-community's knowledge. Probably no one got your email but me.

I don't know how you could have made sure everyone was copied, TBH. We
have to think about that one, too. Maybe add sub-owners?


> It would be little-to-no change to me to do this to my GDB 7.5 bot, for
> example - I glance at every failure that comes through anyway. All I'd do
> differently is forward anything that I thought looked like a real, unique
> failure, to the mailing list/blame list, rather than having it done
> automatically. This does not seem terribly onerous. Is it?

You mind one bot. I mind 11, and the list is growing.

Our bots are very different from each other, and the failures that
happen to one rarely happen to others. I am solving the contingency
issue, but that takes time. I agree that's largely my responsibility,
but we can't go from "it's ok to have some red bots" to "we're doomed,
kill them all" overnight.

I am working towards the goals we both agree, but it *will* take some
time. I'd appreciate some patience.


> I... don't, really. As with my own GDB 7.5 buildbot, I pretty much assume
> interesting failures will probably involve me helping to triage (especially
> with the Apple engineers explicitly not having access to the source/test
> cases run there) the issues. The bot sends me email on every red, and I
> treat that as pretty much a thing I need to care about until it's green, as
> much as possible by acting as a facilitator to the original contributor who
> committed the breakage.

ARM is one of the main architectures in LLVM. Compatibility with GDB
7.5 is an important, but substantially less important. It may look
selfish from my part, but I don't think you can compare them as
equals.

A lot more people, projects and companies will be upset if ARM support
regresses, than if the GDB 7.5 bot stays red for a few weeks, or even
a few months.

Given the importance, I don't think it's feasible (or healthy) for me
to own most of the bots, but for now, it is what it is. I'd appreciate
if other companies that do care about ARM could *also* contribute and
maintain ARM bots on their own. But even that will take some time.


> Do you believe there's no quality point in a buildbot notification where it
> is not worth sending mail/notification?

No, I agree with you on almost all technical points. But those changes
need to take some time to happen.


>> How do you XFAIL a Clang miscompilation of Clang?
>
> It's a good question - seems like it'd be something we might want to have
> some way of doing. Perhaps we could have some stub test cases that are used
> to describe some of these sort of tests.

To answer my own question, I think staged bots is the solution here.


> If they're from previous commits/it's a flakey product issue - that's
> tricky, for sure.

One critical thing that doesn't get caught: Zorg changes. Maybe we
should add a monitor to Zorg on every SVN poller. If we can, make sure
that we build every Zorg change isolated from any other.


> None of these things require infinite anything. There's a "reasonable" level
> of turnaround that can help quite a bit.

"reasonable" depends on how many resources (money, hardware,
engineers) you have. You're seeing everyone else with your own
glasses, assuming you could fix the problem in X days because you have
N engineers, M money and Y hardware availability, whereas all those
variables are different to other people / companies.

By saying that "everyone willing to help" should invest as much as
Google or Apple does, you're essentially shutting off everyone else
*but* Google and Apple from the project. That's where the risk of
forking comes from.

cheers,
--renato


More information about the llvm-dev mailing list