[llvm-dev] False positive notifications around commit notifications
Mehdi AMINI via llvm-dev
llvm-dev at lists.llvm.org
Fri Sep 10 11:36:56 PDT 2021
On Thu, Sep 9, 2021 at 3:18 PM Philip Reames via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> I've been noticing a trend where there is more and more false positive
> email notifications sent out on valid commits. This is getting really
> problematic as real signal is being lost in the noise. I've had several
> cases in the last few weeks where I did not see a "real" failure notice
> because it was buried in a bunch of false positives.
>
> Let me run through a few sources of what I consider false positives, and
> suggest a couple things we could do to clean these up. Note that the
> recommendations here are entirely independent and we can adopt any subset.
>
> *Slow Try Bots*
>
> ex: "This revision was landed with ongoing or failed builds." on
> https://reviews.llvm.org/D109091
>
> Someone - I'm not really sure who - enabled builds for all reviews, and
> this notice on landed commits. Given it's utterly routine to make a last
> few style fixes before landing an LGTMed change
>
I do such "few style fixes", but I don't re-upload a revision before
landing, so I don't see this "false positive" in general.
What I frequently see is that the pre-merge config is broken for some other
reason, and that's quite annoying. One aspect of the issue is that the is
no buildbot tracking the pre-merge configuration so it can be broken
without notification (there is a buildkite job tracking it, but buildkite
does not support blamelist notifications).
> , I consider this notice complete noise. In practice, almost review gets
> tagged this way. To be clear, there is value in being told about changes
> which don't build. The false positive part is only around the "ongoing"
> builds.
>
> Recommendation: Disable this message for the "ongoing" build case, and if
> we can't, disable them entirely.
>
> *Flaky Builders*
>
> ex: https://lab.llvm.org/buildbot/#/builders/68/builds/18250
>
> We have many build bots which are not entirely stable. It's gotten to the
> point where I *expect* failure notifications on literally every change I
> land. I've been trying to reach out to individual build bot owners to get
> issues resolved, and to their credit, most owners have been very
> responsive. However, we have enough builders that the situation isn't
> getting meaningful better.
>
> Recommendation: Introduce specific "test commits" whose only purpose is to
> run the CI infrastructure. Any builder which notifies of failure on such a
> commit (and only said commit) is disabled without discussion until human
> action is taken by the bot owner to re-enable. The idea here is to a)
> automate the process, and b) shift the responsibility of action to the bot
> owner for any flaky bot.
>
> Note: By "disabled", I specifically mean that *notification* is disabled.
> Leaving it in the waterfall view is fine, as long as we're not sending out
> email about it.
>
> Aside: It's really tempting to attempt to separate builders which are
> "still failing" (e.g. a rare configuration which has been broken for a few
> days) from "flaky" ones. I'd argue any bot notifying on a "still failing"
> case is buggy, and thus it's fine to treat them the same as a "flaky" bot.
>
>
> *Slow Builders and Redundant Notices *
>
> ex: https://lab.llvm.org/buildbot#builders/67/builds/4128
>
> Occasionally, we have a bad commit land which breaks every (or nearly
> every) builder. That happens. If you happen to land a change just before
> or after it, you then get on the blame list for every slow running builder
> we have (since they tend to have large commit windows) if they happen to
> cycle before the fix is committed. This is particularly annoying since the
> root issue is likely fixed quickly, but due to cycle times on the builders,
> you may be getting emails for 24 hours to come.
>
> Recommendation: Introduce a new requirement for "slow" builders (say cycle
> time of > 30 minutes) either a) have a maximum commit window of ~15
> commits, or b) use a staged builder model. Personally, I'd prefer the
> staged model, but the max commit window at least helps to limit the
> damage.
>
> By "staged builder model", I mean that slow builders only build points in
> the history which have already been successfully build by one of the fast
> builders. This eliminates redundant build failures, at the cost of
> delaying the slow builder slightly. As long as the slow builder uses the
> "last good commit" as opposed to waiting until the current fast builder
> finishes, the delay should be very minimal for most commits.
>
Does buildbot support staged builders? That would really be ideal indeed!
If we could also disable notification to the blamelist when it is larger
than 5, that'd be great!
Cheers,
--
Mehdi
> Philip
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210910/c8dec7e9/attachment.html>
More information about the llvm-dev
mailing list