[LLVMdev] [cfe-dev] LLVM IRC channel flooded?

Wed May 20 17:52:53 PDT 2015


On 05/20/2015 11:04 AM, Renato Golin wrote:
> On 20 May 2015 at 18:47, Philip Reames <listmail at philipreames.com> wrote:
>> One particular irritant is getting emails 12-24 hours later about someone else's
>> breakage that has *already been fixed*.  The long cycling bots are really
>> irritating in that respect.
> That's not that easy to fix, and I think we'll have to cope with that
> forever. Not all machines are fast, and some buildbots do a full
> self-host, with compiler-rt and running all tests. Others do a full
> benchmark run of LNT, running it 5-8 times, which can take several
> hours on an ARM box.
I agree it's not easy, but it's not something we should just live with 
either. There are ways to address the problem and we should consider them.

As a randomly chosen example, one thing we could do would be to have the 
notion of a "last good commit".  Fast builders would cycle off ToT, if 
any one (or some subset) passed, that advances the notion of last good 
commit.  Slower builders should cycle off the last good commit, not 
ToT.  We have all the mechanisms to implement this today.  It could be 
as simple as parsing the JSON output of buildbot in the script that runs 
the slower build bots and sync to a particular revision rather than ToT.
>
> The benchmark bots should be marked not to spam, since they're not
> there to pick up errors, but the full self-hosting ones do need to
> warn on errors. For example, right now I have a bug only on a thumbv7a
> self-hosting bot, and not on others. I'm now bisecting it to find the
> culprit, but this is not always clear, as the longer it takes for me
> to realise, the harder it will be to fix it.
At this point, you're long past the point I was grossing about.  I'm not 
arguing that long running bots shouldn't notify; I'm arguing they 
shouldn't report *obvious* false positives.

Also, the bisect step really should be automated... :)
>
> The only way out of it is for people to look at the fast bots, and if
> they're fixed, check the commit that did it and see if the slow bot
> has been fixed by the same commit later.
You've now wasted 10 minutes or more my time per slow noisy bot. When I 
routinely get 10+ builder failure emails for changes that are clean, 
that's not worthwhile investment.
>
> Buildbot owners will eventually pick those problems up, but as I said,
> the longer it takes, the harder it is to get to the bottom of it, and
> the higher is the probability of getting more regressions introduced
> because the bot is red and won't warn.
I agree.  All I'm suggesting is reducing noise so that real failures are 
likely to be noticed quickly.
>
> cheers,
> --renato