[llvm] [CSSPGO] Error out if the checksum mismatch is high (PR #84097)

Fri Mar 15 13:16:44 PDT 2024

================
@@ -2184,6 +2204,61 @@ bool SampleProfileLoader::doInitialization(Module &M,
   return true;
 }
 
+// Note that this is a module-level check. Even if one module is errored out,
+// the entire build will be errored out. However, the user could make big
+// changes to functions in single module but those changes might not be
+// performance significant to the whole binary. Therefore, we use a conservative
+// approach to make sure we only error out if it globally impacts the binary
+// performance. To achieve this, we use heuristics to select a reasonable
+// big set of functions that are supposed to be globally performance
+// significant, only compute and check the mismatch within those functions. The
+// function selection is based on two criteria: 1) The function is "hot" enough,
+// which is tuned by a hotness-based flag(ChecksumMismatchFuncHotBlockSkip). 2)
+// The num of function is large enough which is tuned by the
+// ChecksumMismatchNumFuncSkip flag.
+bool SampleProfileLoader::errorIfHighChecksumMismatch(
+    Module &M, ProfileSummaryInfo *PSI, const SampleProfileMap &Profiles) {
+  assert(FunctionSamples::ProfileIsProbeBased &&
+         "Only support for probe-based profile");
+  uint64_t TotalSelectedFunc = 0;
+  uint64_t NumMismatchedFunc = 0;
+  for (const auto &I : Profiles) {
+    const auto &FS = I.second;
+    const auto *FuncDesc = ProbeManager->getDesc(FS.getGUID());
+    if (!FuncDesc)
+      continue;
+
+    // We want to select a set of functions that are globally performance
+    // significant, in other words, if those functions profiles are
+    // checksum-mismatched and dropped, the whole binary will likely be
+    // impacted, so here we use a hotness-based threshold to control the
+    // selection.
+    if (FS.getTotalSamples() <
+        ChecksumMismatchFuncHotBlockSkip * PSI->getOrCompHotCountThreshold())
----------------
wlei-llvm wrote:

> 1. The multiplier doesn't really has to do with "block" of the function, so the name can be confusing?
> 2. What do you think of using `isFunctionHotInCallGraphNthPercentile` or `isHotCountNthPercentile` with a percentile cutoff? We can tun the percentile to get what we need.

Let me try to clarify my initial thought: IIUC,  all the "HotCount" stuffs in PSI are "block-level"(or body-sample level) concept,  the `isFunctionHotXXX` function are used to check the **entry_count** of the function which is also "block-level"(the first block), however what I used here is the `getTotalSamples()` which I intend to include all the callee/inlinee samples(it could be the entry count is cold, but it does a lot of hot inlinings), as total_samples is kind of sum of all the block counts of this function, so I used a multiplier which I want to mean how many blocks on average is hot in this function.   

So if we pass  a sum of blocks(`getTotalSamples`) to this `isFunctionHotInCallGraphNthPercentile ` or `isHotCountNthPercentile ` function, might still be confusing? 

I feet we actually don't have a function-level thing for this hotness check. so I was also thinking to add a new function to PSI, like:(but no sure if it's generally useful)
```
int getNumHotBlocks(const FunctionSamples &FS) {
   int num = 0; 
   for(samples in all body samples) 
      if(samples > isHotCount())
         num++;
    return num;
}
```
Any thought? thanks!


https://github.com/llvm/llvm-project/pull/84097