[LLVMdev] [cfe-dev] LLVM IRC channel flooded?

Thu May 21 02:05:41 PDT 2015

On 21 May 2015 at 01:52, Philip Reames <listmail at philipreames.com> wrote:
> As a randomly chosen example, one thing we could do would be to have the
> notion of a "last good commit".  Fast builders would cycle off ToT, if any
> one (or some subset) passed, that advances the notion of last good commit.
> Slower builders should cycle off the last good commit, not ToT.  We have all
> the mechanisms to implement this today.  It could be as simple as parsing
> the JSON output of buildbot in the script that runs the slower build bots
> and sync to a particular revision rather than ToT.

Not all slow builders have the same sources as the fast builders. For
example, our "full" builders consider compiler-rt, while the fast ones
don't.

> At this point, you're long past the point I was grossing about.  I'm not
> arguing that long running bots shouldn't notify; I'm arguing they shouldn't
> report *obvious* false positives.

Well, that's yet another fix we need for all builders. I think we're missing:

1. Detection of infrastructure vs. real code problems. There isn't a
simple way of doing this, so just adding patterns to "infrastructure"
problems being ignored, everything else is an error, would be ok.

2. Detection of different failures. If new tests fail, or the build
fail instead of tests, the bot should email *again*. This is very
problematic and why we're so angry towards broken bots.

3. Detection of long running failures, that might have been forgotten.
No emails to the blame list, but an email to the bot owner would help.

> Also, the bisect step really should be automated... :)

It's not always simple, especially when self-hosting. If each step
takes 7 hours, guessing what the output is and waiting 7 days to
realise it wasn't is not a good use of resources. For those cases I
always bisect manually.

> You've now wasted 10 minutes or more my time per slow noisy bot. When I
> routinely get 10+ builder failure emails for changes that are clean, that's
> not worthwhile investment.

I know. That's why I do that on my own bots. It's my time to spend.

Maybe we should divide the bots into three categories. Fast, Slow and
Experimental.

Fast bots are everyone's responsibility. Slow bots are the bot
owners'. Experimental can safely be ignored. That's pretty much what I
do now with my NOC page.

As a bot owner, if I want to reduce my time spend on slow bots, I'll
have to work hard to make it fast, and not transfer the burden to the
rest of the community.

cheers,
--renato