[PATCH] D145171: [FSAFDO] Improve FS discriminator encoding

Wenlei He via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Mar 7 00:21:36 PST 2023


wenlei added a comment.

> This reduces the hash conflicts.

Curious how did you check/detect conflicts/collisions?

> one should expect a performance gain over the binary without this change.

How big is the gain you saw?



================
Comment at: llvm/lib/CodeGen/MIRFSDiscriminator.cpp:67
   for (DIL = DIL->getInlinedAt(); DIL; DIL = DIL->getInlinedAt()) {
     Ret ^= updateHash(std::to_string(DIL->getLine()));
     Ret ^= updateHash(DIL->getScope()->getSubprogram()->getLinkageName());
----------------
same here, converting to string first seem unnecessary. 


================
Comment at: llvm/lib/CodeGen/MIRFSDiscriminator.cpp:77
+      return 0;
+    return MD5Hash(Str);
+  };
----------------
MD5 as cryptographic hash is expensive, `xxHash64` should be good enough in terms of distribution and collision avoidance. Now that we're changing discriminator algorithm, wondering if we should take the opportunity to move to xxhash for the new version. (For compilation with compact-binary profile where MD5 is used a lot, MD5 shows up quite hot in perf profile)


================
Comment at: llvm/lib/CodeGen/MIRFSDiscriminator.cpp:81
+  for (DIL = DIL->getInlinedAt(); DIL; DIL = DIL->getInlinedAt()) {
+    Ret ^= updateHash(std::to_string(DIL->getLine()));
+    Ret ^= updateHash(DIL->getName());
----------------
If collision is a concern and we're working to reduce collision, `^=` is a weak blend/combine function (symmetric among other problems.) 

Something like this below is a safer combine function. We have a number of similar `hashCombine` in LLVM code base. 

```
inline int64_t hashCombine(const int64_t Seed, const int64_t Val) {
  std::hash<int64_t> Hasher;
  return Seed ^ (Hasher(Val) + 0x9e3779b9 + (Seed << 6) + (Seed >> 2));
}
```


================
Comment at: llvm/lib/CodeGen/MIRFSDiscriminator.cpp:139
+    if (ImprovedFSDiscriminator)
+      BBSizeHash = MD5Hash(std::to_string(BBSize(BB)));
+
----------------
converting int to string just to get an hash seems like an overkill. `xxHash64(ArrayRef<uint8_t> Data)` should be fast and good enough.  


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D145171/new/

https://reviews.llvm.org/D145171



More information about the llvm-commits mailing list