[cfe-dev] [llvm-dev] Uncovering non-determinism in LLVM - An Update

Thu Aug 31 03:20:10 PDT 2017

On 30 August 2017 at 18:51, David Blaikie via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
>
>
> On Tue, Aug 29, 2017 at 11:45 AM Grang, Mandeep Singh via cfe-dev
> <cfe-dev at lists.llvm.org> wrote:
>>
>> Hi All,
>>
>> I wanted to share a couple of updates on the effort to uncover
>> non-determinism in LLVM through reverse iteration.
>>
>> 1. Reverse iteration has now been enabled for DenseMap
>> (https://reviews.llvm.org/D35043)
>>
>> 2. We have setup a nightly reverse iteration buildbot
>> (http://lab.llvm.org:8011/builders/reverse-iteration).
>> This builds all LLVM targets with reverse iteration ON and runs ninja
>> check-all. Currently there are 14 unit test failures. Please feel free
>> to fix these.
>>
>> Also currently, only I receive the nightly email notification for this
>> buildbot run. My plan is to enable sending the nightly notifications to
>> llvm-commits once all 14 failures have been resolved.
>> Please let me know if the community wants the nightly notifications even
>> with the failures.
>> As a potential next step, I was thinking about bootstrapping this
>> reverse iteration LLVM to compile itself. Not sure if it can uncover
>> more bugs but maybe worth a shot.
>
>
> To uncover bugs in this configuration, I believe you'd want/need a
> stage2/stage3 comparison which might be a bit tricky/expensive*, something
> like:
>
> build clang twice (reverse and forward enabled) then build (in one mode,
> doesn't matter which I think) clang or other release binaries (or even the
> whole release) from each of those and compare them bit-for-bit, they should
> be identical.
>
> * If you want other developers to act on bugs found, the buildbot needs to
> have a short blame list (this can be done on a slow buildbot by having
> multiple slaves/builders running in parallel) but preferably also a short
> cycle time (so failures are reported soon after they're created) - otherwise
> expect to do a lot of triage yourself (& possibly leave the emails only
> going to you - because they'll have too large blame lists/revision ranges
> and people won't find them actionable) & then probably following up on the
> specific commit you believe introduced the problem and either fixing it
> yourself or replying on the commits list to report it to the original
> contributor.
>

I agree with what David said here, but I just wanted to say that you
shouldn't feel too discouraged because of it.

As someone that occasionally has to bisect 5h+ worth of revisions, I
can tell you that in time you'll often be able to just look at the
revisions and spot the culprit, or maybe 2-3 candidates that have
likely caused the issue. Given that this bot does something very
specific, you can then probably just inspect the code and see what
caused the problem (if the revision doesn't touch any containers, then
it probably didn't cause the issue, right?). It's a lot easier when
you have a revision range, so it obviously won't take as long to
identify and fix as the initial failures that you are seeing now.

Ultimately, it's up to you to decide how much effort you are willing /
able to put into this. This kind of failures probably won't even occur
that often in practice, but when they do I think it's important to
find them and fix them. The best way to know for sure is to give it a
try for a while and see how it goes. If you find that it's
impractical, you can always revert to the current configuration.

>>
>>
>> All comments/suggestions welcome.
>>
>> Thanks,
>> Mandeep
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>