[PATCH] D113688: [CSSPGO] Fix a hash code truncating issue in ContextTrieNode.

Wenlei He via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Nov 15 22:51:31 PST 2021


wenlei accepted this revision.
wenlei added a comment.

lgtm with a nit, thanks.



================
Comment at: llvm/lib/Transforms/IPO/SampleContextTracker.cpp:35
 
-  uint32_t Hash = nodeHash(CalleeName, CallSite);
+  auto Hash = nodeHash(CalleeName, CallSite);
   auto It = AllChildContext.find(Hash);
----------------
nit: the convention is to use auto when full type name is too long. uint64_t is an unambiguous case for explicit type.  same for other places.


================
Comment at: llvm/lib/Transforms/IPO/SampleContextTracker.cpp:183-185
+  uint64_t NameHash = std::hash<std::string>{}(ChildName.str());
+  uint64_t LocId = (Callsite.LineOffset << 16) | Callsite.Discriminator;
   return NameHash + (LocId << 5) + LocId;
----------------
hoy wrote:
> wenlei wrote:
> > hoy wrote:
> > > wenlei wrote:
> > > > This `nodeHash` is the blend function similar to DJB and it's assuming 32-bit length, see the shift 16. If we are fully switching to 64 bit hash, we should change the blend function too.
> > > > 
> > > > However, does the collision come from the blend function (`nodeHash`), or lower part of `std::hash`? I guess it's the later, then I'm curious what strings actually lead to collision on the lower 32 bit? 
> > > > 
> > > > 
> > > Good point! Is `(Callsite.LineOffset << 32) | Callsite.Discriminator;` a good form for 64-bit? 
> > > 
> > > Yes, the collision came from a colliding lower-32-bit of std::hash
> > > 
> > > 
> > > ```
> > > Same lower 32-bit: 2086825821
> > > Name1:          _ZNSt8__detail15_Hash_code_baseINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4pairIKS6_PmENS_10_Select1stESt4hashIS6_ENS_18_Mod_range_hashingENS_20_Default_ranged_hashELb1EE10_M_extractEv
> > >                   _
> > > Name2: ZSt12__niter_baseIPN4onnx3UseESt6vectorIS1_SaIS1_EEET_N9__gnu_cxx17__normal_iteratorIS6_T0_EE
> > > ```
> > > 
> > > 
> > > 
> > > 
> > > Is (Callsite.LineOffset << 32) | Callsite.Discriminator; a good form for 64-bit? 
> > 
> > I think that should be good.
> > 
> > Though I'd be surprised if we really need 64-bit hash to avoid collision. We just want to avoid collision among all callees of a function, which is a small space.
> > 
> > Not sure what's the hash function used by std::hash, but perhaps that is not a hash with great distribution when truncated. Can you try `djbHash` from `llvm-project/llvm/include/llvm/Support/DJB.h` to see if we can make it work without increasing to 64-bit?
> > 
> >  
> Looks like 32-bit hash isn't working well:
> 
> 
> ```
> uint64_t NameHash1 = djbHash("_ZN8facebook18storage_foundation15detail_gen_"
>                               "avx226t10x1_95cc5b80e2a6436ca08eEPKPKhPKPhm");
>   uint64_t NameHash2 = djbHash(
>       "_ZSt4moveIRPN5caffe5LayerIdEEEONSt16remove_referenceIT_E4typeEOS6_");
>   outs() << NameHash1 << " " << NameHash2 << "\n";
> 
> output:
> 309151195 309151195
> ```
> 
okay, thanks for checking. Using 64-bit hash makes sense then. 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113688/new/

https://reviews.llvm.org/D113688



More information about the llvm-commits mailing list