[llvm-dev] False positive notifications around commit notifications
Philip Reames via llvm-dev
llvm-dev at lists.llvm.org
Thu Sep 9 15:18:10 PDT 2021
I've been noticing a trend where there is more and more false positive
email notifications sent out on valid commits. This is getting really
problematic as real signal is being lost in the noise. I've had several
cases in the last few weeks where I did not see a "real" failure notice
because it was buried in a bunch of false positives.
Let me run through a few sources of what I consider false positives, and
suggest a couple things we could do to clean these up. Note that the
recommendations here are entirely independent and we can adopt any subset.
*Slow Try Bots*
ex: "This revision was landed with ongoing or failed builds." on
Someone - I'm not really sure who - enabled builds for all reviews, and
this notice on landed commits. Given it's utterly routine to make a
last few style fixes before landing an LGTMed change, I consider this
notice complete noise. In practice, almost review gets tagged this
way. To be clear, there is value in being told about changes which
don't build. The false positive part is only around the "ongoing" builds.
Recommendation: Disable this message for the "ongoing" build case, and
if we can't, disable them entirely.
We have many build bots which are not entirely stable. It's gotten to
the point where I *expect* failure notifications on literally every
change I land. I've been trying to reach out to individual build bot
owners to get issues resolved, and to their credit, most owners have
been very responsive. However, we have enough builders that the
situation isn't getting meaningful better.
Recommendation: Introduce specific "test commits" whose only purpose is
to run the CI infrastructure. Any builder which notifies of failure on
such a commit (and only said commit) is disabled without discussion
until human action is taken by the bot owner to re-enable. The idea
here is to a) automate the process, and b) shift the responsibility of
action to the bot owner for any flaky bot.
Note: By "disabled", I specifically mean that *notification* is
disabled. Leaving it in the waterfall view is fine, as long as we're
not sending out email about it.
Aside: It's really tempting to attempt to separate builders which are
"still failing" (e.g. a rare configuration which has been broken for a
few days) from "flaky" ones. I'd argue any bot notifying on a "still
failing" case is buggy, and thus it's fine to treat them the same as a
*Slow Builders and Redundant Notices
Occasionally, we have a bad commit land which breaks every (or nearly
every) builder. That happens. If you happen to land a change just
before or after it, you then get on the blame list for every slow
running builder we have (since they tend to have large commit windows)
if they happen to cycle before the fix is committed. This is
particularly annoying since the root issue is likely fixed quickly, but
due to cycle times on the builders, you may be getting emails for 24
hours to come.
Recommendation: Introduce a new requirement for "slow" builders (say
cycle time of > 30 minutes) either a) have a maximum commit window of
~15 commits, or b) use a staged builder model. Personally, I'd prefer
the staged model, but the max commit window at least helps to limit the
By "staged builder model", I mean that slow builders only build points
in the history which have already been successfully build by one of the
fast builders. This eliminates redundant build failures, at the cost of
delaying the slow builder slightly. As long as the slow builder uses
the "last good commit" as opposed to waiting until the current fast
builder finishes, the delay should be very minimal for most commits.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev