[PATCH] D113688: [CSSPGO] Fix a hash code truncating issue in ContextTrieNode.

Hongtao Yu via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Nov 15 08:53:05 PST 2021


hoy added inline comments.


================
Comment at: llvm/lib/Transforms/IPO/SampleContextTracker.cpp:183-185
+  uint64_t NameHash = std::hash<std::string>{}(ChildName.str());
+  uint64_t LocId = (Callsite.LineOffset << 16) | Callsite.Discriminator;
   return NameHash + (LocId << 5) + LocId;
----------------
wenlei wrote:
> hoy wrote:
> > wenlei wrote:
> > > This `nodeHash` is the blend function similar to DJB and it's assuming 32-bit length, see the shift 16. If we are fully switching to 64 bit hash, we should change the blend function too.
> > > 
> > > However, does the collision come from the blend function (`nodeHash`), or lower part of `std::hash`? I guess it's the later, then I'm curious what strings actually lead to collision on the lower 32 bit? 
> > > 
> > > 
> > Good point! Is `(Callsite.LineOffset << 32) | Callsite.Discriminator;` a good form for 64-bit? 
> > 
> > Yes, the collision came from a colliding lower-32-bit of std::hash
> > 
> > 
> > ```
> > Same lower 32-bit: 2086825821
> > Name1:          _ZNSt8__detail15_Hash_code_baseINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4pairIKS6_PmENS_10_Select1stESt4hashIS6_ENS_18_Mod_range_hashingENS_20_Default_ranged_hashELb1EE10_M_extractEv
> >                   _
> > Name2: ZSt12__niter_baseIPN4onnx3UseESt6vectorIS1_SaIS1_EEET_N9__gnu_cxx17__normal_iteratorIS6_T0_EE
> > ```
> > 
> > 
> > 
> > 
> > Is (Callsite.LineOffset << 32) | Callsite.Discriminator; a good form for 64-bit? 
> 
> I think that should be good.
> 
> Though I'd be surprised if we really need 64-bit hash to avoid collision. We just want to avoid collision among all callees of a function, which is a small space.
> 
> Not sure what's the hash function used by std::hash, but perhaps that is not a hash with great distribution when truncated. Can you try `djbHash` from `llvm-project/llvm/include/llvm/Support/DJB.h` to see if we can make it work without increasing to 64-bit?
> 
>  
Looks like 32-bit hash isn't working well:


```
uint64_t NameHash1 = djbHash("_ZN8facebook18storage_foundation15detail_gen_"
                              "avx226t10x1_95cc5b80e2a6436ca08eEPKPKhPKPhm");
  uint64_t NameHash2 = djbHash(
      "_ZSt4moveIRPN5caffe5LayerIdEEEONSt16remove_referenceIT_E4typeEOS6_");
  outs() << NameHash1 << " " << NameHash2 << "\n";

output:
309151195 309151195
```



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113688/new/

https://reviews.llvm.org/D113688



More information about the llvm-commits mailing list