[cfe-dev] [llvm-dev] Uncovering non-determinism in LLVM - An Update

Fri Sep 1 11:45:02 PDT 2017

On Fri, Sep 1, 2017 at 11:34 AM Grang, Mandeep Singh <mgrang at codeaurora.org>
wrote:

> Thanks David and Diana for your suggestions. Yes, I am looking at
> setting up the builder to run after every 10 commits (is 10 a reasonable
> number?) and notify the blame list in case of failures.
>

Mostly you probably just end up having the bot run as fast as possible - I
think the default configuration maybe has a time based delay so it doesn't
fire off on the first commit after a quiet period, but if nothing else
comes in for a few minutes, it goes off and runs on a single commit rather
than sitting idle.

I guess you'd find more in the zorg repository.

But please keep an eye on how reliable/actionable the emails are - if
developers aren't responding to/fixing issues, or if the bot is sending
fail mail for other uninteresting things (like build failures that other
buildbots alerady diagnosed) - please tweak/tune or disable the buildbot. I
know there's already a lot of buildbot email spam, but everything we can do
to reduce the noise is really important.

>
> My plan is to enable this once all the current failures in the reverse
> builder are fixed (currently there are 4 failures).
>
> As for the bit-by-bit comparison of forward vs reverse builders is
> concerned I am trying to convince my team to dedicate some resources to
> this. Not sure how soon I can get this done :)
>
> Also thanks Diana for your ideas on how to debug/fix reverse iteration
> failures. Actually I have been following a similar strategy to fix these
> issues so far. Maybe I will update the community on this in a future
> email thread.
>
> --Mandeep
>
>
> On 8/31/2017 3:20 AM, Diana Picus wrote:
> > On 30 August 2017 at 18:51, David Blaikie via llvm-dev
> > <llvm-dev at lists.llvm.org> wrote:
> >>
> >> On Tue, Aug 29, 2017 at 11:45 AM Grang, Mandeep Singh via cfe-dev
> >> <cfe-dev at lists.llvm.org> wrote:
> >>> Hi All,
> >>>
> >>> I wanted to share a couple of updates on the effort to uncover
> >>> non-determinism in LLVM through reverse iteration.
> >>>
> >>> 1. Reverse iteration has now been enabled for DenseMap
> >>> (https://reviews.llvm.org/D35043)
> >>>
> >>> 2. We have setup a nightly reverse iteration buildbot
> >>> (http://lab.llvm.org:8011/builders/reverse-iteration).
> >>> This builds all LLVM targets with reverse iteration ON and runs ninja
> >>> check-all. Currently there are 14 unit test failures. Please feel free
> >>> to fix these.
> >>>
> >>> Also currently, only I receive the nightly email notification for this
> >>> buildbot run. My plan is to enable sending the nightly notifications to
> >>> llvm-commits once all 14 failures have been resolved.
> >>> Please let me know if the community wants the nightly notifications
> even
> >>> with the failures.
> >>> As a potential next step, I was thinking about bootstrapping this
> >>> reverse iteration LLVM to compile itself. Not sure if it can uncover
> >>> more bugs but maybe worth a shot.
> >>
> >> To uncover bugs in this configuration, I believe you'd want/need a
> >> stage2/stage3 comparison which might be a bit tricky/expensive*,
> something
> >> like:
> >>
> >> build clang twice (reverse and forward enabled) then build (in one mode,
> >> doesn't matter which I think) clang or other release binaries (or even
> the
> >> whole release) from each of those and compare them bit-for-bit, they
> should
> >> be identical.
> >>
> >> * If you want other developers to act on bugs found, the buildbot needs
> to
> >> have a short blame list (this can be done on a slow buildbot by having
> >> multiple slaves/builders running in parallel) but preferably also a
> short
> >> cycle time (so failures are reported soon after they're created) -
> otherwise
> >> expect to do a lot of triage yourself (& possibly leave the emails only
> >> going to you - because they'll have too large blame lists/revision
> ranges
> >> and people won't find them actionable) & then probably following up on
> the
> >> specific commit you believe introduced the problem and either fixing it
> >> yourself or replying on the commits list to report it to the original
> >> contributor.
> >>
> > I agree with what David said here, but I just wanted to say that you
> > shouldn't feel too discouraged because of it.
> >
> > As someone that occasionally has to bisect 5h+ worth of revisions, I
> > can tell you that in time you'll often be able to just look at the
> > revisions and spot the culprit, or maybe 2-3 candidates that have
> > likely caused the issue. Given that this bot does something very
> > specific, you can then probably just inspect the code and see what
> > caused the problem (if the revision doesn't touch any containers, then
> > it probably didn't cause the issue, right?). It's a lot easier when
> > you have a revision range, so it obviously won't take as long to
> > identify and fix as the initial failures that you are seeing now.
> >
> > Ultimately, it's up to you to decide how much effort you are willing /
> > able to put into this. This kind of failures probably won't even occur
> > that often in practice, but when they do I think it's important to
> > find them and fix them. The best way to know for sure is to give it a
> > try for a while and see how it goes. If you find that it's
> > impractical, you can always revert to the current configuration.
> >
> >>>
> >>> All comments/suggestions welcome.
> >>>
> >>> Thanks,
> >>> Mandeep
> >>> _______________________________________________
> >>> cfe-dev mailing list
> >>> cfe-dev at lists.llvm.org
> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> >>
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> llvm-dev at lists.llvm.org
> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20170901/f6efb8b5/attachment.html>