<div dir="ltr">It seems much better to me to use a CPRNG than to rely on something like MT, which has significant weaknesses despite its long period. As long as I can hand it /dev/urandom or an equivalent seed file *and actually use that* I don't 1000% care, but using a non-cryptographic RNG on top of that is very smelly.<div>
<br></div><div>It also simplifies the code (since you don't need to add in a new RNG, just read off of a stream) and makes it more testable (since RNGs are notoriously easy to get wrong and hard to prove right).</div>
</div><div class="gmail_extra"><br><br><div class="gmail_quote">On Wed, Jul 2, 2014 at 1:57 PM, JF Bastien <span dir="ltr"><<a href="mailto:jfb@chromium.org" target="_blank">jfb@chromium.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>The RNG should have a seed in the module that we can use to create the RNG as needed. For predictability we agreed on MT19937 which has a lot of good properties (well-balanced, doesn't have a warmup period, etc). Module-level metadata works well for storing the seed.</div>
<div><br>We need to ensure that we can reproduce runs of passes -- imagine that opt passes used random numbers. We need to ensure that SimplifyCFG makes the same decisions in "opt -instcombine -simplifycfg" and "opt -simplifycfg". Consequently, each pass needs to create its own RNG from the seed provided in the module.</div>
<div><br></div><div>But we also want to avoid correlated random numbers. Each pass that creates an RNG should use a seed = strong-hash-function(in-module seed + per-pass salt). The in-module salt should probably also be generated in a manner that includes the initial file name. That will allow us to get different random numbers in a per-function pass that is processing the same function, for instance when included from the same header.</div>
<div><br></div><div>Any consumer of these who needs properties like uniform distribution will need to wash the output of the RNG themselves.</div></div></div></div></blockquote><div><br></div><div>This plan sounds good to me overall, IIUC it often (but not always) a user to get the same randomization with the same seed even when LLVM sources differ. It also keeps the desirable properties of the current approach.</div>
<div><br></div><div>I especially like the per-pass salt idea. How would it be specified? A 64-bit integer argument to INITIALIZE_PASS should work? I'd be wary of recycling the current INITIALIZE_PASS arguments like pass name or description.</div>
<div><br></div><div>What do you mean by "The in-module salt should probably also be generated in a manner that includes the initial file name."? IIUC what you mean then this is already the case in the current patch, where the Module's ID is use (only the base name).</div>
<div><br></div><div>Agreed on getting rid of distribution.</div><div><br></div><div>One design point that Geremy Condra had suggested early on was to allow him to supply his own RNG. He was specifically looking at using a standalone file with random numbers as a stream. I think this change doesn't make it impossible, even if each pass would reuse the same file from its beginning, because of the presence of the double salt (per-Module, and per-Pass). I'll ask him to chime in.</div>
</div></div></div>
</blockquote></div><br></div>