[llvm] [CSSPGO] Compute and report profile matching recovered callsites and samples (PR #79090)

Fri Feb 16 08:52:32 PST 2024

================
@@ -478,20 +505,31 @@ class SampleProfileMatcher {
                      std::map<LineLocation, StringRef> &IRAnchors);
   void findProfileAnchors(
       const FunctionSamples &FS,
-      std::map<LineLocation, std::unordered_set<FunctionId>>
-          &ProfileAnchors);
-  void countMismatchedSamples(const FunctionSamples &FS);
-  void countProfileMismatches(
-      const Function &F, const FunctionSamples &FS,
-      const std::map<LineLocation, StringRef> &IRAnchors,
-      const std::map<LineLocation, std::unordered_set<FunctionId>>
-          &ProfileAnchors);
-  void countProfileCallsiteMismatches(
-      const FunctionSamples &FS,
-      const std::map<LineLocation, StringRef> &IRAnchors,
+      std::map<LineLocation, std::unordered_set<FunctionId>> &ProfileAnchors);
+  // Record the callsite match states for profile staleness report, the result
+  // is saved in FuncCallsiteMatchStates.
+  void recordCallsiteMatchStates(
+      const Function &F, const std::map<LineLocation, StringRef> &IRAnchors,
       const std::map<LineLocation, std::unordered_set<FunctionId>>
           &ProfileAnchors,
-      uint64_t &FuncMismatchedCallsites, uint64_t &FuncProfiledCallsites);
+      const LocToLocMap *IRToProfileLocationMap);
+
+  bool isMismatchState(const enum MatchState &State) {
----------------
wlei-llvm wrote:

> If checksum actually matches, do we expect any InitialMismatch from such functions?
> 
> Or in other words, if checksum matches, there is no state change to track, do we actually still need to run `recordCallsiteMatchStates`? Can we just count how many samples, callsites and callsite samples in those functions and assume they all match?

It could happen, I saw in our dashboard, even the checksum mismatched samples is zero, there are a few numbers of callsite mismatches(e.g. Stale function samples: 0.00%(773327/65978370982), Stale functions: 0.01%(6/42787) , Stale callsite samples: 0.29%(194149265/65978370982), Stale callsite samples after matching: 0.29%(194318114/65978370982), Stale callsites: 5.95%(20780/349307) ,  Stale callsites after matching:  5.95%(20781/349307))

I checked that it's mostly from the ctor alias optimization, it's one kind of "function-rename"(in this case it's not from user change), on the other hand, the function-renaming from user change is also in this category.  Checking on checksum matches can help detect this issue. 

Though this ctor alias functions are all very cold functions, it's maybe fine to ignore this if this "false positives" (this is more like a future support)  is noisy. 

Any thoughts?


https://github.com/llvm/llvm-project/pull/79090