<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Hi,<div class=""><br class=""></div><div class="">I’ve been looking at Chandler Carruth talks about internal data structures for LLVM in order to implement my own library. That’s how I ended up looking at the internals of DenseMap and the default hash functions defined in DenseMapInfo.</div><div class=""><br class=""></div><div class="">I have been very surprised by the hash function used for integers which is hash(k) = 37 * k. This kind of hashing function does not change the lowest bits of k, and if your keys happen to all be a multiple of 8, you end up not using about 90% of the slot as a direct access in an open addressing hash table. By the way, does anyone know the rationale behind using 37?</div><div class=""><br class=""></div><div class="">In order to improve this, it could be nice to use a better hash function such as the Knuth hash which is defined as:</div><div class=""><br class=""></div><div class="">std::size_t hash_value(std::size_t val, int p) {</div><div class=""> assert(p >= 0 && p <= 64);</div><div class=""><br class=""></div><div class=""> const std::size_t knuth = 11133510565745311;</div><div class=""> </div><div class=""> return (val * knuth) >> (64 - p);</div><div class="">}</div><div class=""><br class=""></div><div class="">The big advantage of this function is that is shuffles all the bits and does not suffer from the problems given above. It is also quite cheap to compute. Unfortunately, one needs to change the signature of the hash_value function as the hash function in order to take an integer k and hash it into a integer in {0, 1, 2, …, 2^p} is no longer in the form hash(k) % 2^p. It needs to be in the form hash(k, p).</div><div class=""><br class=""></div><div class="">As I don’t know anything about the LLVM source code, here are a few questions:</div><div class="">- Is DenseMap used a lot with integers as a key in LLVM? Do you think it would benefit from a better default hash function?</div><div class="">- Is there any benchmark to run that makes a heavy use of those hash function so I can see if my changes would improve performance?</div><div class=""><br class=""></div><div class="">Best regards,</div><div class=""><br class=""><div class="">
<div style="color: rgb(0, 0, 0); letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div style="color: rgb(0, 0, 0); letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">François Fayard<br class="">Founder & Consultant - Inside Loop</div><div style="color: rgb(0, 0, 0); letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Applied Mathematics & High Performance Computing<br class="">Tel: +33 (0)6 01 44 06 93<br class="">Web: <a href="http://www.insideloop.io" class="">www.insideloop.io</a><br class=""></div></div>
</div>
<br class=""></div></body></html>