<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On 2 July 2014 11:52, Chandler Carruth <span dir="ltr"><<a href="mailto:chandlerc@google.com" target="_blank">chandlerc@google.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div><br><div class="gmail_quote">On Tue, Jul 1, 2014 at 9:50 PM, Chris Lattner <span dir="ltr"><<a href="mailto:clattner@apple.com" target="_blank">clattner@apple.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Any thoughts guys? If we are unclear what the right design here is, I'd prefer you to revert the patch until it is figured out.</blockquote>
</div><br></div>I'm moderately confident this isn't the right design...</div><div class="gmail_extra"><br></div><div class="gmail_extra">Specifically, I totally get why the seed and salt data needed by an RNG might be in the module (preferably opaquely in module-level metadata or something), but I don't really understand why the RN*G* would be in the module rather than in the pass or thing which is generating the random data for some purpose.</div>
</div></blockquote><div><br></div><div>Chandler and I just had a talk about this and formed some amount of consensus that I want to relay to the list.</div><div><br></div><div>The RNG should have a seed in the module that we can use to create the RNG as needed. For predictability we agreed on MT19937 which has a lot of good properties (well-balanced, doesn't have a warmup period, etc). Module-level metadata works well for storing the seed.</div>
<div><br>We need to ensure that we can reproduce runs of passes -- imagine that opt passes used random numbers. We need to ensure that SimplifyCFG makes the same decisions in "opt -instcombine -simplifycfg" and "opt -simplifycfg". Consequently, each pass needs to create its own RNG from the seed provided in the module.</div>
<div><br></div><div>But we also want to avoid correlated random numbers. Each pass that creates an RNG should use a seed = strong-hash-function(in-module seed + per-pass salt). The in-module salt should probably also be generated in a manner that includes the initial file name. That will allow us to get different random numbers in a per-function pass that is processing the same function, for instance when included from the same header.</div>
<div><br></div><div>Any consumer of these who needs properties like uniform distribution will need to wash the output of the RNG themselves.</div><div><br></div><div>Stephen and Chris, does this work for both of you? Are the motivations behind this design clear?<br>
</div>
<div><br></div><div>Nick<br></div><div><br></div></div></div></div>