<div dir="ltr"><br><br><div class="gmail_quote"><div dir="ltr">On Thu, Jul 6, 2017 at 1:08 PM Daniel Berlin via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Thu, Jul 6, 2017 at 12:34 PM, Sean Silva <span dir="ltr"><<a href="mailto:chisophugis@gmail.com" target="_blank">chisophugis@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote"><div><div class="m_5233375800023946560h5">On Thu, Jul 6, 2017 at 10:20 AM, Daniel Berlin via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote"><div><div class="m_5233375800023946560m_5771199437449982016h5">On Thu, Jul 6, 2017 at 8:02 AM, Robinson, Paul via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span><br>
<br>
> -----Original Message-----<br>
> From: llvm-dev [mailto:<a href="mailto:llvm-dev-bounces@lists.llvm.org" target="_blank">llvm-dev-bounces@lists.llvm.org</a>] On Behalf Of<br>
> Grang, Mandeep Singh via llvm-dev<br>
> Sent: Thursday, July 06, 2017 2:56 AM<br>
> To: <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
> Subject: [llvm-dev] Uncovering non-determinism in LLVM - The Next Steps<br>
><br>
> Hi all,<br>
><br>
> Last year I had shared with the community my findings about instances of<br>
> non-determinism in llvm codegen. The major source of which was the<br>
> iteration of unordered containers resulting in non-deterministic<br>
> iteration order. In order to uncover such instances we had introduced<br>
> "reverse iteration" of unordered containers (currently only enabled for<br>
> SmallPtrSet).<br>
> I would now like to take this effort forward and propose to do the<br>
> following:<br>
><br>
> 1. We are in the process of setting up an internal nightly buildbot<br>
> which would build llvm with the cmake flag -<br>
> DLLVM_REVERSE_ITERATION:BOOL=ON.<br>
> This will make all supported containers iterate in reverse order by<br>
<br>
</span>I hope you mean all supported *unordered* containers here. :-)<br>
<span><br>
> default. We would then run "ninja check-all". Any failing unit test is a<br>
> sign of a potential non-determinism.<br>
<br>
</span>When you did this with SmallPtrSet, were there tests that failed but<br>
did not actually indicate non-determinism?<br></blockquote><div><br></div></div></div><div>An example of this is the order of predecessors in the IR in phi nodes. There are passes that will create them in different orders depending on smallptrset iteration.</div><div>This is "non-deterministic" in the sense that the textual form is different, but has the same semantic meaning either way.</div><div>(Let's put aside the fact that allowing them to have a different order than the actual block predecessors is a pointless waste of time :P)</div><div><br></div><div>Whether you consider this non-deterministic depends on your goal.</div><div><br></div><div>I would argue that any pass that behaves differently given </div><div>phi [[1, block 1], [2, block 2]]</div><div>and </div><div>phi [[2, block 2], [1, block 1]] </div><div><br></div><div>is just flat out broken (and we have some that break due to poor design, etc)</div><div><br></div><div>So i wouldn't consider the above to be non-deterministic in any meaningful sense, despite it outputting different textual form.</div></div></div></div></blockquote><div><br></div></div></div><div>One of our definitions of non-determinism is simply "output from command line tools should always be bit identical given identical inputs", which is suitable for content-based caching build systems like Bazel.</div></div></div></div></blockquote></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>Just to point out: These systems already often have to ignore whitespace differences, etc. </div><div><br></div><div>I'd also argue any of these content-based caching build systems is going to be rarely used if they require every single tool that uses them to produce bit identical output (IE boiling the ocean), as opposed to letting the tools define what identical means or something.</div></div></div></div></blockquote><div><br>This seems inconsistent with my understanding of LLVM's (& Google's, re: Bazel) goals..<br><br>So far as I know, LLVM has fixed pretty much any "the same input causes different output bits, even if they're semantically equivalent" bug that's reported (well, they're considered bugs at least - sometimes takes a while for someone to prioritize them)<br><br>While it doesn't outright break Bazel when that property doesn't hold, I believe it can cause more cache invalidation than is ideal (eg: build system evicts an object file from the cache, but still has the resulting executable in the cache - to do the link step it goes to rebuild the objects*, finds a new/different object and then reruns the link because of it)<br><br>Also for LLVM testing purposes, I believe any case where the output text/bits are different are usually fixed so the tests are reliable.<br><br>* maybe this scenario doesn't really happen - if Bazel assumes reproducibility is transitive, it could observe that the input hashes are the same so it doesn't need to run the task at all if all the other object files are available - could just assume the resulting binary would be the same as the one it already has<br> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><br></div><div>I don't know how our bitcode encoding compares to the textual IR in the case of your phi example, but assuming that that difference makes it into the bitcode too, it would cause e.g. ThinLTO bitcode artifacts to violate the content-based caching assumptions, even if semantically to the compiler the difference is immaterial.</div></div></div></div></blockquote><div><br></div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>You are certainly welcome to try to make all these things that are semantically identical try to be completely syntactically identical as well, but i'm pretty uninterested in slowing down or banning passes from doing certain things to do that.</div><div><br></div><div>IE if you want this to happen, IMHO, this should be happening in output writing, not somewhere else.</div><div><br></div><div>To give another example: Currently, IIRC, the order of basic blocks the function iterator goes through is "as they appear in input".</div><div>There is no real defined or required ordering (IE it's not, for example, sorted in a preorder walk from the entry block or something) for correctness.</div><div><br></div><div>I could make a pass that randomizes the list which for (auto &BB : F) walks, and nothing else in the compiler should change :)</div></div></div></div></blockquote><div><br></div><div>No optimizations should change, but I think it'd be pretty difficult to support testing that pass (granted with LLVM's testing approach, at least that would be fairly isolated to only the tests for the pass - it wouldn't pollute other pass tests with nondeterminism of output, so might not be too painful)<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>I don't think you can say such a pass is broken.<br></div></div></div></div></blockquote><div><br>I kind of would say it's broken. Arbitrarily reordering, I'd be OK with, nondeterministically reordering would seem not OK to me (again, based on the sentiments I've heard expressed from Chandler and others (though I may've misunderstood/may be incorrectly representing their perspective) on the project/my best understanding of the goals, etc - and I tend to agree with them, but could well be wrong)<br> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div></div><div>Hence I think if you want such constraints, they need to be placed on output orderings, not on what passes do.<br></div></div></div></div></blockquote><div><br>Ah, hmm, maybe there's a point of confusion. Changing the order of BBs in a Function would change the output ordering, right? Presumably that would be reflected in printed IR, bitcode, and possibly in block layout (at least at -O0, but I could imagine some aspects of that ordering leak out even when LLVM does advanced block layout optimizations).<br> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div></div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div> I know that Richard has mentioned in the past at least for Clang the intention is bit-identical output for bit-identical input.</div><span class="m_5233375800023946560HOEnZb"><font color="#888888"><div><br></div><div>-- Sean Silva</div></font></span><span><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><br></div><div><br></div></div></div></div>
<br>_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
<br></blockquote></span></div><br></div></div>
</blockquote></div></div></div>
_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
</blockquote></div></div>