<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Wed, Jul 9, 2014 at 12:17 AM, Nick Lewycky <span dir="ltr"><<a href="mailto:nicholas@mxc.ca" target="_blank">nicholas@mxc.ca</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div class="">Stephen Crane wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

On Tue, Jul 8, 2014 at 11:26 AM, Geremy Condra<<a href="mailto:gcondra@google.com" target="_blank">gcondra@google.com</a>>  wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

It seems much better to me to use a CPRNG than to rely on something like MT,<br>

which has significant weaknesses despite its long period. As long as I can<br>

hand it /dev/urandom or an equivalent seed file *and actually use that* I<br>

don't 1000% care, but using a non-cryptographic RNG on top of that is very<br>

smelly.<br>

</blockquote>

<br>

I completely agree with you that a CSPRNG would be best. However, we<br>

got so much pushback from the mailing list that I felt it was better<br>

to start small. Keeping the current interface and adding an optional<br>

better implementation underneath seems like the way to go here.<br>

</blockquote>

<br></div>

I'm not opposed to a CSPRNG here, but I am concerned. Firstly I don't see why we should need it and I'd like the consumers of the random stream to ensure that aren't relying on any particular strength of the random stream. If they want to do a hash on the RNG output to prevent correlation, the caller should do that. Second, I'm not sure I trust us LLVMers to maintain a cryptographically strong RNG. I don't know that we have the skill set for that.<br>

</blockquote><div><br></div><div>Thus my suggestion to use an external stream of randomness, which requires essentially zero cryptographic skill to audit and reduces the amount of code to boot. </div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


If it's critical to have a CSPRNG to make your feature useful then you should argue for it. As it is, the plan is to permit upgrading to a newer RNG by using a different NamedMDNode name which includes the algorithm name.<div class="">

<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

At least for our use cases, we couldn't use /dev/{u}random directly<br>

because we needed reproducibility. However, the workflow I plan to use<br>

with this is grab a seed from /dev/random at the beginning of the<br>

build process, note that down somewhere, and use that seed for the<br>

rest of the build. We could certainly do something similar with a<br>

slightly modified RNG impl class which uses a random buffer or<br>

separate process to generate better randomness with a larger seed.<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

It also simplifies the code (since you don't need to add in a new RNG, just<br>

read off of a stream) and makes it more testable (since RNGs are notoriously<br>

easy to get wrong and hard to prove right).<br>

</blockquote>

<br>

Yes, as long as that stream is reproducible somehow. I think we should<br>

preserve the option to recreate all random choices made by LLVM when<br>

bugs crop up or for generating patches.<br>

</blockquote>

<br></div>

The ability to reproduce the same decisions when debugging the compiler is critical. Even the proposal of re-keying our RNG on a per-pass basis is far from perfect, it allows us to narrow down the passes but not the actual input source code. If we remove a few lines from the middle of a function then the RNG stream will get out of sync and that may mask the bug. Solving that too would be fantastic. :) Realistically I'm relying on random chance to allow us to reduce the code down to a reasonably sized testcase.</blockquote>

<div><br></div><div>Thus my suggestion of relying on an external stream of randomness. Something as simple as:</div><div><br></div><div>dd if=/dev/urandom of=/my/totes/random/data bs=1M count=100</div><div><br></div><div>

gets you a totally reproducible build. As an added bonus, maintaining counters for rng bytes consumed during the process would allow you to chop up the process simply by adding/removing/moving the corresponding bits in the randomness source.</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="HOEnZb"><font color="#888888"><br>

<br>

Nick<br>

</font></span></blockquote></div><br></div></div>