[llvm] [memprof] Use BLAKE for FrameId (PR #109501)

Kazu Hirata via llvm-commits llvm-commits at lists.llvm.org
Tue Sep 24 15:25:01 PDT 2024


https://github.com/kazutakahirata updated https://github.com/llvm/llvm-project/pull/109501

>From eccc91be73d7926806bb4a3cddc1858890d787c9 Mon Sep 17 00:00:00 2001
From: Kazu Hirata <kazu at google.com>
Date: Fri, 20 Sep 2024 15:52:22 -0700
Subject: [PATCH 1/2] [memprof] Use BLAKE for FrameId

This patch uses a stronger hash function for FrameId.

Without this patch, I've observed hash collisions in fairly common
scenarios.  Specifically, for a given GUID, I see three pairs of hash
collisions in the two-dimensional range 1 <= LineOffset <= 70 and
1 <= Column <= 50, which may be a bit too frequent.  With the new
function, I don't see collisions at all even for a large profile.

Impact on serialization/deserialization should be as follows:

- Up to indexed memprof format Version 2, inclusive: The FrameIds
  computed with the new hash function will show up as part of the
  profile file.  However, the deserializer only treats FrameIds as
  keys (but not hash values) to retrieve Frames from the on-disk hash
  table, so a change of the hash function shouldn't matter.

- For indexed memprof format Version 3, FrameIds do not show up at all
  in the resulting profile file.  FrameIds are only used to break ties
  when we sort Frames in writeMemProfFrameArray.
---
 llvm/include/llvm/ProfileData/MemProf.h | 23 +++++++++--------------
 1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/llvm/include/llvm/ProfileData/MemProf.h b/llvm/include/llvm/ProfileData/MemProf.h
index 892865a0a26fea..25e183ee4c1327 100644
--- a/llvm/include/llvm/ProfileData/MemProf.h
+++ b/llvm/include/llvm/ProfileData/MemProf.h
@@ -7,8 +7,10 @@
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/IR/GlobalValue.h"
 #include "llvm/ProfileData/MemProfData.inc"
+#include "llvm/Support/BLAKE3.h"
 #include "llvm/Support/Endian.h"
 #include "llvm/Support/EndianStream.h"
+#include "llvm/Support/HashBuilder.h"
 #include "llvm/Support/raw_ostream.h"
 
 #include <bitset>
@@ -312,20 +314,13 @@ struct Frame {
   // hashing from llvm ADT since we are going to persist the hash id, the hash
   // combine algorithm in ADT uses a new randomized seed each time.
   inline FrameId hash() const {
-    auto HashCombine = [](auto Value, size_t Seed) {
-      std::hash<decltype(Value)> Hasher;
-      // The constant used below is the 64 bit representation of the fractional
-      // part of the golden ratio. Used here for the randomness in their bit
-      // pattern.
-      return Hasher(Value) + 0x9e3779b97f4a7c15 + (Seed << 6) + (Seed >> 2);
-    };
-
-    size_t Result = 0;
-    Result ^= HashCombine(Function, Result);
-    Result ^= HashCombine(LineOffset, Result);
-    Result ^= HashCombine(Column, Result);
-    Result ^= HashCombine(IsInlineFrame, Result);
-    return static_cast<FrameId>(Result);
+    llvm::HashBuilder<llvm::TruncatedBLAKE3<8>, llvm::endianness::little>
+        HashBuilder;
+    HashBuilder.add(Function, LineOffset, Column, IsInlineFrame);
+    llvm::BLAKE3Result<8> Hash = HashBuilder.final();
+    FrameId Id;
+    std::memcpy(&Id, Hash.data(), sizeof(Hash));
+    return Id;
   }
 };
 

>From 0e63141111945c521a90446d02a10150cadabd56 Mon Sep 17 00:00:00 2001
From: Kazu Hirata <kazu at google.com>
Date: Tue, 24 Sep 2024 15:24:23 -0700
Subject: [PATCH 2/2] Update a comment.

---
 llvm/include/llvm/ProfileData/MemProf.h | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/llvm/include/llvm/ProfileData/MemProf.h b/llvm/include/llvm/ProfileData/MemProf.h
index 25e183ee4c1327..f8121d35732518 100644
--- a/llvm/include/llvm/ProfileData/MemProf.h
+++ b/llvm/include/llvm/ProfileData/MemProf.h
@@ -310,9 +310,11 @@ struct Frame {
        << "        Inline: " << IsInlineFrame << "\n";
   }
 
-  // Return a hash value based on the contents of the frame. Here we don't use
-  // hashing from llvm ADT since we are going to persist the hash id, the hash
-  // combine algorithm in ADT uses a new randomized seed each time.
+  // Return a hash value based on the contents of the frame. Here we use a
+  // cryptographic hash function to minimize the chance of hash collisions.  We
+  // do persist FrameIds as part of memprof formats up to Version 2, inclusive.
+  // However, the deserializer never calls this function; it uses FrameIds
+  // merely as keys to look up Frames proper.
   inline FrameId hash() const {
     llvm::HashBuilder<llvm::TruncatedBLAKE3<8>, llvm::endianness::little>
         HashBuilder;



More information about the llvm-commits mailing list