[cfe-dev] [llvm-dev] Uncovering non-determinism in LLVM - An Update

Fri Sep 1 11:34:43 PDT 2017

Thanks David and Diana for your suggestions. Yes, I am looking at 
setting up the builder to run after every 10 commits (is 10 a reasonable 
number?) and notify the blame list in case of failures.

My plan is to enable this once all the current failures in the reverse 
builder are fixed (currently there are 4 failures).

As for the bit-by-bit comparison of forward vs reverse builders is 
concerned I am trying to convince my team to dedicate some resources to 
this. Not sure how soon I can get this done :)

Also thanks Diana for your ideas on how to debug/fix reverse iteration 
failures. Actually I have been following a similar strategy to fix these 
issues so far. Maybe I will update the community on this in a future 
email thread.

--Mandeep

On 8/31/2017 3:20 AM, Diana Picus wrote:
> On 30 August 2017 at 18:51, David Blaikie via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>>
>> On Tue, Aug 29, 2017 at 11:45 AM Grang, Mandeep Singh via cfe-dev
>> <cfe-dev at lists.llvm.org> wrote:
>>> Hi All,
>>>
>>> I wanted to share a couple of updates on the effort to uncover
>>> non-determinism in LLVM through reverse iteration.
>>>
>>> 1. Reverse iteration has now been enabled for DenseMap
>>> (https://reviews.llvm.org/D35043)
>>>
>>> 2. We have setup a nightly reverse iteration buildbot
>>> (http://lab.llvm.org:8011/builders/reverse-iteration).
>>> This builds all LLVM targets with reverse iteration ON and runs ninja
>>> check-all. Currently there are 14 unit test failures. Please feel free
>>> to fix these.
>>>
>>> Also currently, only I receive the nightly email notification for this
>>> buildbot run. My plan is to enable sending the nightly notifications to
>>> llvm-commits once all 14 failures have been resolved.
>>> Please let me know if the community wants the nightly notifications even
>>> with the failures.
>>> As a potential next step, I was thinking about bootstrapping this
>>> reverse iteration LLVM to compile itself. Not sure if it can uncover
>>> more bugs but maybe worth a shot.
>>
>> To uncover bugs in this configuration, I believe you'd want/need a
>> stage2/stage3 comparison which might be a bit tricky/expensive*, something
>> like:
>>
>> build clang twice (reverse and forward enabled) then build (in one mode,
>> doesn't matter which I think) clang or other release binaries (or even the
>> whole release) from each of those and compare them bit-for-bit, they should
>> be identical.
>>
>> * If you want other developers to act on bugs found, the buildbot needs to
>> have a short blame list (this can be done on a slow buildbot by having
>> multiple slaves/builders running in parallel) but preferably also a short
>> cycle time (so failures are reported soon after they're created) - otherwise
>> expect to do a lot of triage yourself (& possibly leave the emails only
>> going to you - because they'll have too large blame lists/revision ranges
>> and people won't find them actionable) & then probably following up on the
>> specific commit you believe introduced the problem and either fixing it
>> yourself or replying on the commits list to report it to the original
>> contributor.
>>
> I agree with what David said here, but I just wanted to say that you
> shouldn't feel too discouraged because of it.
>
> As someone that occasionally has to bisect 5h+ worth of revisions, I
> can tell you that in time you'll often be able to just look at the
> revisions and spot the culprit, or maybe 2-3 candidates that have
> likely caused the issue. Given that this bot does something very
> specific, you can then probably just inspect the code and see what
> caused the problem (if the revision doesn't touch any containers, then
> it probably didn't cause the issue, right?). It's a lot easier when
> you have a revision range, so it obviously won't take as long to
> identify and fix as the initial failures that you are seeing now.
>
> Ultimately, it's up to you to decide how much effort you are willing /
> able to put into this. This kind of failures probably won't even occur
> that often in practice, but when they do I think it's important to
> find them and fix them. The best way to know for sure is to give it a
> try for a while and see how it goes. If you find that it's
> impractical, you can always revert to the current configuration.
>
>>>
>>> All comments/suggestions welcome.
>>>
>>> Thanks,
>>> Mandeep
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>