[PATCH] D147740: [llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map

William Junda Huang via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Jun 13 11:57:24 PDT 2023


huangjd marked an inline comment as done.
huangjd added inline comments.


================
Comment at: llvm/include/llvm/ProfileData/SampleProf.h:324
+  uint64_t Hash;
+  if (F.getAsInteger(10, Hash))
+    Hash = MD5Hash(F);
----------------
snehasish wrote:
> Is there a chance that the base 10 encoding may change? Is this the only place where we generate the hashes?
No, and doesn't matter. Base-10 encoding is used only because the existing implementation wanted to make it compatible to represent both strings (function names)  and integers (MD5), and the solution was to convert MD5 into a base-10 string. This seems inefficient and I am working on a subsequent patch to deal with it. 


================
Comment at: llvm/include/llvm/ProfileData/SampleProf.h:1288-1289
+  // rather than returning a reference to an irrelevant entry, and pretend the
+  // old entry never existed. Dropping a sample should not affect compilation
+  // correctness.
+  template <typename... Ts>
----------------
snehasish wrote:
> Can this lead to non-deterministic builds?
Why? There's no non-determinism here, the existing entry always gets erased first. SampleProfWriter does sort the profile before writing so it's ok


================
Comment at: llvm/lib/ProfileData/SampleProf.cpp:403
 
-void SampleContextTrimmer::canonicalizeContextProfiles() {
-  std::vector<SampleContext> ProfilesToBeRemoved;
----------------
davidxl wrote:
> is this a dead function? should it be removed in a separate patch?
After refactoring, the invariant  key == value.getContext() is moot, so this function is dead


================
Comment at: llvm/unittests/tools/llvm-profdata/MD5CollisionTest.cpp:84
+
+    /// Samples
+    /// String1:10:1
----------------
snehasish wrote:
> Can we use the text format (with some additional helper functions) here instead of the binary data? It would be hard to update in case of changes in the future.
See comments 
```
...  A unit test is required because the function
/// names are not printable ASCII characters.
```


================
Comment at: llvm/unittests/tools/llvm-profdata/MD5CollisionTest.cpp:84
+
+    /// Samples
+    /// String1:10:1
----------------
huangjd wrote:
> snehasish wrote:
> > Can we use the text format (with some additional helper functions) here instead of the binary data? It would be hard to update in case of changes in the future.
> See comments 
> ```
> ...  A unit test is required because the function
> /// names are not printable ASCII characters.
> ```
This test case is very theoretical, as I can't find two printable ASCII strings with colliding MD5 (there exists for sure, but none is known yet) In this case I have to use a unit test because I cannot validate the output using llvm-lit which requires printable characters. 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147740/new/

https://reviews.llvm.org/D147740



More information about the llvm-commits mailing list