<div dir="ltr">Actually I just came up with a different and better idea. I'll upload a patch shortly.</div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Sep 28, 2017 at 7:02 PM, Rafael Espíndola <span dir="ltr"><<a href="mailto:rafael.espindola@gmail.com" target="_blank">rafael.espindola@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">The results are attached.<br>

<br>

scylla is probably the most interesting. Without threads it gets 1.11x<br>

slower. With 8 cores it is 1.45x faster.<br>

<br>

So this is probably fine, but what do you think of the idea of sorting<br>

by hash to have a reproducible output with any number of shards?<br>

<br>

Cheers,<br>

Rafael<br>

<br>

On 28 September 2017 at 15:42, Rafael Avila de Espindola<br>

<div class="HOEnZb"><div class="h5"><<a href="mailto:rafael.espindola@gmail.com">rafael.espindola@gmail.com</a>> wrote:<br>

> Rafael Avila de Espindola <<a href="mailto:rafael.espindola@gmail.com">rafael.espindola@gmail.com</a>> writes:<br>

><br>

>>> +  // Parallelism. Changing this number causes benign changes in the<br>

>>> +  // order of output section pieces. For build reproducibility, we<br>

>>> +  // always use the same number.<br>

>>> +  static constexpr size_t NumShards = 8;<br>

>><br>

>> One idea that might allow using a variable number of shards:<br>

>><br>

>> Sort the strings once they are know to be unique. The sort order can be<br>

>> based on the hash and look at the string itself only if two hashes are<br>

>> identical, so it shouldn't be too slow.<br>

><br>

> In fact, we could use the N most significant bits of the hash to do the<br>

> sharding. That way we only need to sort inside each shard.<br>

><br>

> Cheers,<br>

> Rafael<br>

</div></div></blockquote></div><br></div>