[llvm] [CSSPGO] Error out if the checksum mismatch is high (PR #84097)

Mon Mar 18 22:01:18 PDT 2024

================
@@ -2184,6 +2204,61 @@ bool SampleProfileLoader::doInitialization(Module &M,
   return true;
 }
 
+// Note that this is a module-level check. Even if one module is errored out,
+// the entire build will be errored out. However, the user could make big
+// changes to functions in single module but those changes might not be
+// performance significant to the whole binary. Therefore, we use a conservative
+// approach to make sure we only error out if it globally impacts the binary
+// performance. To achieve this, we use heuristics to select a reasonable
+// big set of functions that are supposed to be globally performance
+// significant, only compute and check the mismatch within those functions. The
+// function selection is based on two criteria: 1) The function is "hot" enough,
+// which is tuned by a hotness-based flag(ChecksumMismatchFuncHotBlockSkip). 2)
+// The num of function is large enough which is tuned by the
+// ChecksumMismatchNumFuncSkip flag.
+bool SampleProfileLoader::errorIfHighChecksumMismatch(
+    Module &M, ProfileSummaryInfo *PSI, const SampleProfileMap &Profiles) {
+  assert(FunctionSamples::ProfileIsProbeBased &&
+         "Only support for probe-based profile");
+  uint64_t TotalSelectedFunc = 0;
+  uint64_t NumMismatchedFunc = 0;
+  for (const auto &I : Profiles) {
+    const auto &FS = I.second;
+    const auto *FuncDesc = ProbeManager->getDesc(FS.getGUID());
+    if (!FuncDesc)
+      continue;
+
+    // We want to select a set of functions that are globally performance
+    // significant, in other words, if those functions profiles are
+    // checksum-mismatched and dropped, the whole binary will likely be
+    // impacted, so here we use a hotness-based threshold to control the
+    // selection.
+    if (FS.getTotalSamples() <
+        ChecksumMismatchFuncHotBlockSkip * PSI->getOrCompHotCountThreshold())
----------------
wlei-llvm wrote:

I see, sounds good to use `total_samples` to decide if the function is hot. 

I took a look at `isFunctionHotInCallGraphNthPercentile `, looks like the function's parameter is `Function` (`isFunctionHotOrColdInCallGraphNthPercentile(int PercentileCutoff,
                                                   const FuncT *F,
                                                   BFIT &FI)`) https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/Analysis/ProfileSummaryInfo.h#L285 , and it does looks for the annotations on the IR to check the hotness, however, here our function is before the sample annotation, so we can't directly use this function. 
 Actually I found that in `isFunctionHotOrColdInCallGraphNthPercentile` https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/Analysis/ProfileSummaryInfo.h#L285 , it also uses `isHotCountNthPercentile` for `TotalCallCount` which seems also not block-level counts, probably right now we don't have an existing function for total_samples, so maybe to make it simple, we can just use `isHotCountNthPercentile` for total_samples? (Or maybe add a overload function like `isFunctionHotInCallGraphNthPercentile(... , total_samples) ` to accept total_samples, but I think it's really just a wrapper of `isHotCountNthPercentile`)
                                                   
                                                   
                                                

https://github.com/llvm/llvm-project/pull/84097