[llvm-dev] Buildbot Noise

Mon Oct 19 22:26:15 PDT 2015

On Mon, Oct 19, 2015 at 12:26 PM, Renato Golin via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Huge inline record again... I'll pick the contentious issues...
>
> On 19 October 2015 at 19:38, David Blaikie <dblaikie at gmail.com> wrote:
> > at all, but /specifically/ about long-red bots that appear neglected.
>
> "appear" is the key here. It'd be better if you ask first, then
> propose to disable later. If I was on holidays, someone (maybe you)
> could have assumed lack of care and disabled them without the ARM
> sub-community's knowledge. Probably no one got your email but me.
>
> I don't know how you could have made sure everyone was copied, TBH. We
> have to think about that one, too. Maybe add sub-owners?
>
>
> > It would be little-to-no change to me to do this to my GDB 7.5 bot, for
> > example - I glance at every failure that comes through anyway. All I'd do
> > differently is forward anything that I thought looked like a real, unique
> > failure, to the mailing list/blame list, rather than having it done
> > automatically. This does not seem terribly onerous. Is it?
>
> You mind one bot. I mind 11, and the list is growing.
>
> Our bots are very different from each other, and the failures that
> happen to one rarely happen to others. I am solving the contingency
> issue, but that takes time. I agree that's largely my responsibility,
> but we can't go from "it's ok to have some red bots" to "we're doomed,
> kill them all" overnight.
>
> I am working towards the goals we both agree, but it *will* take some
> time. I'd appreciate some patience.
>
>
> > I... don't, really. As with my own GDB 7.5 buildbot, I pretty much assume
> > interesting failures will probably involve me helping to triage
> (especially
> > with the Apple engineers explicitly not having access to the source/test
> > cases run there) the issues. The bot sends me email on every red, and I
> > treat that as pretty much a thing I need to care about until it's green,
> as
> > much as possible by acting as a facilitator to the original contributor
> who
> > committed the breakage.
>
> ARM is one of the main architectures in LLVM. Compatibility with GDB
> 7.5 is an important, but substantially less important. It may look
> selfish from my part, but I don't think you can compare them as
> equals.
>
> A lot more people, projects and companies will be upset if ARM support
> regresses, than if the GDB 7.5 bot stays red for a few weeks, or even
> a few months.
>
> Given the importance, I don't think it's feasible (or healthy) for me
> to own most of the bots, but for now, it is what it is. I'd appreciate
> if other companies that do care about ARM could *also* contribute and
> maintain ARM bots on their own. But even that will take some time.
>

I've just been skimming this thread, but for your bots at least (and the
significant effort you are having to invest in triaging failures) it seems
like the core of the issue is that there is not enough redundancy bot-wise
(and bot-owner-wise) on arm. Just by looking at the list of major
contributors to LLVM, it seems like ARM bots at at least a couple companies
should be going red from e.g. a miscompilation on ARM (and hence at least a
couple engineers of redundancy in triaging the issue, to help distribute
the work (so that e.g. you can go on vacation and not be scared about the
state of your bots when you get back)).

Is the state of things just historical? Surely at least ARM the company and
Linaro both have ARM self-host bots that will simultaneously go red in case
of a self-host miscompilation? (I honestly haven't looked; I'm honestly
curious; based on the information you've written above I honestly don't
know the answer)

-- Sean Silva

>
>
> > Do you believe there's no quality point in a buildbot notification where
> it
> > is not worth sending mail/notification?
>
> No, I agree with you on almost all technical points. But those changes
> need to take some time to happen.
>
>
> >> How do you XFAIL a Clang miscompilation of Clang?
> >
> > It's a good question - seems like it'd be something we might want to have
> > some way of doing. Perhaps we could have some stub test cases that are
> used
> > to describe some of these sort of tests.
>
> To answer my own question, I think staged bots is the solution here.
>
>
> > If they're from previous commits/it's a flakey product issue - that's
> > tricky, for sure.
>
> One critical thing that doesn't get caught: Zorg changes. Maybe we
> should add a monitor to Zorg on every SVN poller. If we can, make sure
> that we build every Zorg change isolated from any other.
>
>
> > None of these things require infinite anything. There's a "reasonable"
> level
> > of turnaround that can help quite a bit.
>
> "reasonable" depends on how many resources (money, hardware,
> engineers) you have. You're seeing everyone else with your own
> glasses, assuming you could fix the problem in X days because you have
> N engineers, M money and Y hardware availability, whereas all those
> variables are different to other people / companies.
>
> By saying that "everyone willing to help" should invest as much as
> Google or Apple does, you're essentially shutting off everyone else
> *but* Google and Apple from the project. That's where the risk of
> forking comes from.
>
> cheers,
> --renato
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151019/f5b2d05c/attachment.html>