[llvm-dev] False positive notifications around commit notifications

Thu Sep 9 15:18:10 PDT 2021

I've been noticing a trend where there is more and more false positive 
email notifications sent out on valid commits.  This is getting really 
problematic as real signal is being lost in the noise.  I've had several 
cases in the last few weeks where I did not see a "real" failure notice 
because it was buried in a bunch of false positives.

Let me run through a few sources of what I consider false positives, and 
suggest a couple things we could do to clean these up.  Note that the 
recommendations here are entirely independent and we can adopt any subset.

*Slow Try Bots*

ex: "This revision was landed with ongoing or failed builds." on 
https://reviews.llvm.org/D109091

Someone - I'm not really sure who - enabled builds for all reviews, and 
this notice on landed commits.  Given it's utterly routine to make a 
last few style fixes before landing an LGTMed change, I consider this 
notice complete noise.  In practice, almost review gets tagged this 
way.  To be clear, there is value in being told about changes which 
don't build.  The false positive part is only around the "ongoing" builds.

Recommendation: Disable this message for the "ongoing" build case, and 
if we can't, disable them entirely.

*Flaky Builders*

ex: https://lab.llvm.org/buildbot/#/builders/68/builds/18250

We have many build bots which are not entirely stable.  It's gotten to 
the point where I *expect* failure notifications on literally every 
change I land.  I've been trying to reach out to individual build bot 
owners to get issues resolved, and to their credit, most owners have 
been very responsive.  However, we have enough builders that the 
situation isn't getting meaningful better.

Recommendation: Introduce specific "test commits" whose only purpose is 
to run the CI infrastructure.  Any builder which notifies of failure on 
such a commit (and only said commit) is disabled without discussion 
until human action is taken by the bot owner to re-enable.  The idea 
here is to a) automate the process, and b) shift the responsibility of 
action to the bot owner for any flaky bot.

Note: By "disabled", I specifically mean that *notification* is 
disabled.  Leaving it in the waterfall view is fine, as long as we're 
not sending out email about it.

Aside: It's really tempting to attempt to separate builders which are 
"still failing" (e.g. a rare configuration which has been broken for a 
few days) from "flaky" ones.  I'd argue any bot notifying on a "still 
failing" case is buggy, and thus it's fine to treat them the same as a 
"flaky" bot.

*Slow Builders and Redundant Notices
*

ex: https://lab.llvm.org/buildbot#builders/67/builds/4128

Occasionally, we have a bad commit land which breaks every (or nearly 
every) builder.  That happens.  If you happen to land a change just 
before or after it, you then get on the blame list for every slow 
running builder we have (since they tend to have large commit windows) 
if they happen to cycle before the fix is committed.  This is 
particularly annoying since the root issue is likely fixed quickly, but 
due to cycle times on the builders, you may be getting emails for 24 
hours to come.

Recommendation: Introduce a new requirement for "slow" builders (say 
cycle time of > 30 minutes) either a) have a maximum commit window of 
~15 commits, or b) use a staged builder model. Personally, I'd prefer 
the staged model, but the max commit window at least helps to limit the 
damage.

By "staged builder model", I mean that slow builders only build points 
in the history which have already been successfully build by one of the 
fast builders.  This eliminates redundant build failures, at the cost of 
delaying the slow builder slightly.  As long as the slow builder uses 
the "last good commit" as opposed to waiting until the current fast 
builder finishes, the delay should be very minimal for most commits.

Philip

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210909/a8bf5aa8/attachment.html>