[PATCH] D35221: [scudo] PRNG makeover

Tue Jul 11 10:08:08 PDT 2017

cryptoad requested review of this revision.
cryptoad added inline comments.

================
Comment at: lib/scudo/scudo_utils.h:84
+    // size of the random data left in the cache.
+    CachedBytes = next() | (1ULL << 63);
+  }
----------------
cryptoad wrote:
> cryptoad wrote:
> > alekseyshl wrote:
> > > Does this trick really help with performance?
> > The numbers do not differ enough to go one way or the other, but the u8 was taking up to extra 8bytes depending on architecture. It feels it should be faster, but it definitely save space.
> Actually scratch that. My machine was too loaded to give correct results.
> The initial version appears to be faster (with the u8):
> 
> ```
> kostyak at kostyak-linux:~$ clang++ -O3 rand.cc -o rand -DWITH_CACHEDBYTESAVAILABLE
> kostyak at kostyak-linux:~$ ./rand                                                 
> [?] duration: 4009332558ns
> kostyak at kostyak-linux:~$ clang++ -O3 rand.cc -o rand
> kostyak at kostyak-linux:~$ ./rand
> [?] duration: 4788913046ns
> ```
> For 1<<32 iterations of `getU8`, and quite stable over the course of multiple runs.
> 
> I am going to reintroduce it.
> 
It's actually a lot more nuanced and tricky than I was expecting.
Seeding through /dev/urandom 1<< 12 times, and iterating 1<<20 times per seeding, we get the numbers below (stable over multiple runs).
With clang, 32-bit seems equivalent, 64-bit favors the CachedBytesAvailable version.
With gcc, 32-bit favors CachedBytesAvailable, 64-bit favors the other (and overall slower for either).
```
kostyak at kostyak-linux:~$ clang++ -m32 -O3 rand.cc -o rand -DWITH_CACHEDBYTESAVAILABLE                                                  
kostyak at kostyak-linux:~$ ./rand                                                                                                        
[?] duration: 4814670294ns
kostyak at kostyak-linux:~$ clang++ -m32 -O3 rand.cc -o rand
kostyak at kostyak-linux:~$ ./rand
[?] duration: 4830693788ns
kostyak at kostyak-linux:~$ clang++ -O3 rand.cc -o rand -DWITH_CACHEDBYTESAVAILABLE                                                       
kostyak at kostyak-linux:~$ ./rand
[?] duration: 3115400364ns
kostyak at kostyak-linux:~$ clang++ -O3 rand.cc -o rand
kostyak at kostyak-linux:~$ ./rand
[?] duration: 4394574294ns
kostyak at kostyak-linux:~$ g++ -m32 -O3 rand.cc -o rand -DWITH_CACHEDBYTESAVAILABLE                                                                                                                                                                                               
kostyak at kostyak-linux:~$ ./rand
[?] duration: 8782558601ns
kostyak at kostyak-linux:~$ g++ -m32 -O3 rand.cc -o rand
kostyak at kostyak-linux:~$ ./rand
[?] duration: 9332069877ns
kostyak at kostyak-linux:~$ g++ -O3 rand.cc -o rand -DWITH_CACHEDBYTESAVAILABLE                                                                                                                                                                                                    
kostyak at kostyak-linux:~$ ./rand
[?] duration: 5651244009ns
kostyak at kostyak-linux:~$ g++ -O3 rand.cc -o rand
kostyak at kostyak-linux:~$ ./rand
[?] duration: 4407575998ns
```
Doing some ARM & Aarch64 tests additionally.
At this point I still feel reintroducing CachedBytesAvailable might provide the most benefits. LMKWYT.

https://reviews.llvm.org/D35221