[llvm-dev] Buildbot Noise

Mon Oct 19 14:19:59 PDT 2015

On Mon, Oct 19, 2015 at 12:26 PM, Renato Golin <renato.golin at linaro.org>
wrote:

> Huge inline record again... I'll pick the contentious issues...
>
> On 19 October 2015 at 19:38, David Blaikie <dblaikie at gmail.com> wrote:
> > at all, but /specifically/ about long-red bots that appear neglected.
>
> "appear" is the key here. It'd be better if you ask first, then
> propose to disable later. If I was on holidays, someone (maybe you)
> could have assumed lack of care and disabled them without the ARM
> sub-community's knowledge. Probably no one got your email but me.
>

If it was sitting red unattended, what would've been the harm of disabling
email from the bot, then?

> I don't know how you could have made sure everyone was copied, TBH. We
> have to think about that one, too. Maybe add sub-owners?
>

The mail was sent to llvm-dev - I hope it's reasonable to expect community
members follow that with sub-week latency, perhaps.

Though, to be honest - I sent a bunch of these emails, and I haven't
actually done anything about those that haven't been responded to. I should
go back and reply to them. Mostly my intend is to make sure these
sub-optimal emails are highlighted in the community (llvm-dev being the
most central place we have for such discussion) so we can all see it in the
open, talk about it, and hopefully address them in some way.

>
>
> > It would be little-to-no change to me to do this to my GDB 7.5 bot, for
> > example - I glance at every failure that comes through anyway. All I'd do
> > differently is forward anything that I thought looked like a real, unique
> > failure, to the mailing list/blame list, rather than having it done
> > automatically. This does not seem terribly onerous. Is it?
>
> You mind one bot. I mind 11, and the list is growing.
>

Fair - but that seems even worse to distribute in this way, no? again, for
every bot that has this sort of behavior it is strictly worse to distribute
the load to the community, it seems to me.

> Our bots are very different from each other, and the failures that
> happen to one rarely happen to others.

Great - that means there's not a lot of duplication and most of the issues
should be unique/interesting, not redundant for you to investigate.

> I am solving the contingency
> issue, but that takes time. I agree that's largely my responsibility,
> but we can't go from "it's ok to have some red bots" to "we're doomed,
> kill them all" overnight.
>

It's been a pretty crappy situation for a while - I am only one voice, my
saying "this is crappy, let's not keep putting up with it" is just the
voice of one community member. I wish there were more (& a few have come
out of the woodwork here & there, but it's still clearly something most
people don't care about very much).

> I am working towards the goals we both agree, but it *will* take some
> time. I'd appreciate some patience.
>

> > I... don't, really. As with my own GDB 7.5 buildbot, I pretty much assume
> > interesting failures will probably involve me helping to triage
> (especially
> > with the Apple engineers explicitly not having access to the source/test
> > cases run there) the issues. The bot sends me email on every red, and I
> > treat that as pretty much a thing I need to care about until it's green,
> as
> > much as possible by acting as a facilitator to the original contributor
> who
> > committed the breakage.
>
> ARM is one of the main architectures in LLVM. Compatibility with GDB
> 7.5 is an important, but substantially less important. It may look
> selfish from my part, but I don't think you can compare them as
> equals.
>

Practically speaking they seem to have substantial similarities in the way
the bots are run and the way their results are dealt with. From what you've
said, it sounds like you generally end up being the first line of triage
for ARM failures as I am for GDB failures.

> A lot more people, projects and companies will be upset if ARM support
> regresses, than if the GDB 7.5 bot stays red for a few weeks, or even

a few months.
>

Depends on the breakage (as it would for ARM, I imagine) - there's
certainly a more sliding scale of quality for debugging than for direct
compilation (make a mistake in direct compilation & the program may
misbehave (even in some small subset of programs, that's pretty drastic) -
make a mistake in debug info generation and it doesn't impact all programs
immediately, just when someone goes to debug it, and workarounds are
usually easier).

> Given the importance, I don't think it's feasible (or healthy) for me
> to own most of the bots, but for now, it is what it is. I'd appreciate
> if other companies that do care about ARM could *also* contribute and
> maintain ARM bots on their own. But even that will take some time.
>

Yeah, it's not really your responsibility to make things work for other
companies that care in the abstract but aren't dedicating resources
(hardware, people, whatever) to the effort. If other companies really care,
perhaps if things broke they'd actually come out of the woodwork. If your
bot was red for a week and no one came around asking why ARM was broken,
I'm not sure how much we can assume that there are people out there who
care a great deal about this?

> > Do you believe there's no quality point in a buildbot notification where
> it
> > is not worth sending mail/notification?
>
> No, I agree with you on almost all technical points. But those changes
> need to take some time to happen.
>

I'm suggesting that they may or may not happen, but until they do, having
bot owners do first level triage doesn't seem like a new/interesting cost.
I assume you are doing this already, no? How else do you know to
investigate these failures and all? (how did you know there was a breakage
that you were investigating for a week when I emailed/asked about the bot
status?)

> >> How do you XFAIL a Clang miscompilation of Clang?
> >
> > It's a good question - seems like it'd be something we might want to have
> > some way of doing. Perhaps we could have some stub test cases that are
> used
> > to describe some of these sort of tests.
>
> To answer my own question, I think staged bots is the solution here.
>

I'm not sure how that would address this issue - if a selfhost fails, and
continues to fail for a week while it's investigated, that stage in the
staged bot workflow would remain red. That wouldn't be ideal. (especially
if it's flakey - if it's not flakey, we probably should've just reverted
the patch that made it go red)

> > If they're from previous commits/it's a flakey product issue - that's
> > tricky, for sure.
>
> One critical thing that doesn't get caught: Zorg changes. Maybe we
> should add a monitor to Zorg on every SVN poller. If we can, make sure
> that we build every Zorg change isolated from any other.
>

Wouldn't hurt - are there particular changes to zorg that have caused
misattributed blame that you have in mind? I figured major zorg changes
were rare enough that they were usually manually handled without too much
disruption.

>
>
> > None of these things require infinite anything. There's a "reasonable"
> level
> > of turnaround that can help quite a bit.
>
> "reasonable" depends on how many resources (money, hardware,
> engineers) you have. You're seeing everyone else with your own
> glasses, assuming you could fix the problem in X days because you have
> N engineers, M money and Y hardware availability, whereas all those
> variables are different to other people / companies.
>
> By saying that "everyone willing to help" should invest as much as
> Google or Apple does, you're essentially shutting off everyone else
> *but* Google and Apple from the project.

No - I'm just suggesting that it doesn't seem reasonable to me to expect
the rest of the project to incur the cost of one interested party/group of
persons - when that burden can be shifted to those with the vested interest
(not you alone, as you say - lots of people have an interest in ARM, if
they aren't dedicating resources to it, that should ultimately come back to
hurt them when quality drops in the areas they care about).

We all share some costs, certainly - where it helps lift the whole, and I'm
not suggesting LLVM developers should do nothing for things other than
their known platform - we all want to make things work across the supported
platforms. But when we get emails we can't act effectively act on (no
hardware, long blame lists, vague results, red results that've been red for
some time) we as a whole project are paying a cost that doesn't need to be
paid and/or isn't the best investment for the project as a whole, it seems
to me.

- Dave

> That's where the risk of
> forking comes from.
>

I still don't understand this threat. Anyone who would fork, say, for ARM
support, would have to invest even more resources in testing infrastructure
and analysis to keep themselves going than you/anyone is already, it would
seem. It wouldn't save them anything. (except by choosing to move /much/
more slowly, which is a valid tradeoff and one I don't think the LLVM
project would be interested in making)

- David

>
> cheers,
> --renato
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151019/f371ba0b/attachment.html>