[llvm] r211705 - Random Number Generator (llvm)

Geremy Condra gcondra at google.com
Tue Jul 8 11:26:28 PDT 2014


It seems much better to me to use a CPRNG than to rely on something like
MT, which has significant weaknesses despite its long period. As long as I
can hand it /dev/urandom or an equivalent seed file *and actually use that*
I don't 1000% care, but using a non-cryptographic RNG on top of that is
very smelly.

It also simplifies the code (since you don't need to add in a new RNG, just
read off of a stream) and makes it more testable (since RNGs are
notoriously easy to get wrong and hard to prove right).


On Wed, Jul 2, 2014 at 1:57 PM, JF Bastien <jfb at chromium.org> wrote:

> The RNG should have a seed in the module that we can use to create the RNG
>> as needed. For predictability we agreed on MT19937 which has a lot of good
>> properties (well-balanced, doesn't have a warmup period, etc). Module-level
>> metadata works well for storing the seed.
>>
>> We need to ensure that we can reproduce runs of passes -- imagine that
>> opt passes used random numbers. We need to ensure that SimplifyCFG makes
>> the same decisions in "opt -instcombine -simplifycfg" and "opt
>> -simplifycfg". Consequently, each pass needs to create its own RNG from the
>> seed provided in the module.
>>
>> But we also want to avoid correlated random numbers. Each pass that
>> creates an RNG should use a seed = strong-hash-function(in-module seed +
>> per-pass salt). The in-module salt should probably also be generated in a
>> manner that includes the initial file name. That will allow us to get
>> different random numbers in a per-function pass that is processing the same
>> function, for instance when included from the same header.
>>
>> Any consumer of these who needs properties like uniform distribution will
>> need to wash the output of the RNG themselves.
>>
>
> This plan sounds good to me overall, IIUC it often (but not always) a user
> to get the same randomization with the same seed even when LLVM sources
> differ. It also keeps the desirable properties of the current approach.
>
> I especially like the per-pass salt idea. How would it be specified? A
> 64-bit integer argument to INITIALIZE_PASS should work? I'd be wary of
> recycling the current INITIALIZE_PASS arguments like pass name or
> description.
>
> What do you mean by "The in-module salt should probably also be generated
> in a manner that includes the initial file name."? IIUC what you mean then
> this is already the case in the current patch, where the Module's ID is use
> (only the base name).
>
> Agreed on getting rid of distribution.
>
> One design point that Geremy Condra had suggested early on was to allow
> him to supply his own RNG. He was specifically looking at using a
> standalone file with random numbers as a stream. I think this change
> doesn't make it impossible, even if each pass would reuse the same file
> from its beginning, because of the presence of the double salt (per-Module,
> and per-Pass). I'll ask him to chime in.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140708/18ea934d/attachment.html>


More information about the llvm-commits mailing list