[PATCH] D147545: [CSSPGO] Stale profile matching(part 2)

Lei Wang via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Apr 17 10:55:08 PDT 2023


wlei added inline comments.


================
Comment at: llvm/lib/Transforms/IPO/SampleProfile.cpp:2237-2243
+    if (!IsMatched) {
+      uint32_t CandidateLineOffset = Loc.LineOffset + LocationDelta;
+      LineLocation Candidate(CandidateLineOffset, Loc.Discriminator);
+      InsertMatching(Loc, Candidate);
+      LLVM_DEBUG(dbgs() << "Location is matched from " << Loc << " to "
+                        << Candidate << "\n");
+    }
----------------
wlei wrote:
> wenlei wrote:
> > wlei wrote:
> > > wenlei wrote:
> > > > This doesn't seem to do what you intended to achieve, or at least what the comment says
> > > > 
> > > > > non-anchor location match is based on the offset to the last matched anchor.
> > > > 
> > > > If last matched anchor doesn't exist, should we still match anything? 
> > > > 
> > > > Consider a case where a function is completely changed, and there's no matched callsite at call, in which case we should probably still throw away its profile instead of matching everything with LocationDelta == 0?
> > > > 
> > > > 
> > > Sorry if my description is unclear.
> > > 
> > > My implementation assumes a special non-callsite anchor: `0`, i.e. the beginning of the function. Without the fuzzy matching, it's just this only one anchor matched, and all the others(non-anchor) use the  `LocationDelta = 0` to do the matching, which  means the match source is equal to the target.
> > > 
> > > >Consider a case where a function is completely changed, and there's no matched callsite at call, in which case we should probably still throw away its profile instead of matching everything with LocationDelta == 0?
> > > 
> > > Say:
> > > ```
> > > IR locations:      [1...100 101(foo)]
> > > Profile locations: [1...100 101(bar)]
> > > ```
> > > So my intuition is even without any callsite anchor, matching the leading location would be better than dropping them. (remembering current pseudo-probe implementation, the bb ids all come first, it's not uncommon)
> > > 
> > > 
> > > 
> > Hmm.. I thought if there's no matching call site, that would mean the function has completely changed, so we can't match probes/blocks in a reasonable way. I can see what you have can help matching leaf functions without call sites.
> > 
> > But there needs to be something to make sure when the change is too big, we bail out instead of doing low confidence matching. Maybe we should bail out on zero matched call site + different number of probes/blocks? 
> Good point, I will add some check(like a threshold) to make sure the very bad match to be bailed out and drop the profiles.
I found to do this we need to get some metrics to define how big the change is, also need to test the perf number and tune the threshold, I will handle this in a separate diff.


================
Comment at: llvm/lib/Transforms/IPO/SampleProfile.cpp:2262-2264
+  std::set<LineLocation> AllLocations;
+  // Direct call's callee name will be used as the matching anchor.
+  std::unordered_map<LineLocation, StringRef, LineLocationHash> DirectCallSites;
----------------
wenlei wrote:
> Wondering if we can simplify by merging the two containers, i.e use `std::map<LineLocation, StringRef>`, and empty StringRef value for non-direct-callsite, then we avoid the 2nd map, and also avoid the map look up in `runStaleProfileMatching`. It may use slightly more memory, but feels a bit cleaner. 
> 
> Or even something like `std::map<LineLocation, std::pair<StringRef, bool>>` to have `MatchedCallsiteLocs` merged in as well. These containers feel a bit duplicated.
Good point!  going to remove the `MatchedCallsiteLocs `, so changed to use `std::map<LineLocation, StringRef>` 


================
Comment at: llvm/lib/Transforms/IPO/SampleProfile.cpp:449
+  // For each function, the matcher generates a map, of which each entry is a
+  // mapping from the location of current build to the location in the profile.
+  StringMap<LocToLocMap> FuncToMatchingsMap;
----------------
davidxl wrote:
> nit: location -> source location
Done.


================
Comment at: llvm/lib/Transforms/IPO/SampleProfile.cpp:2167
+    const auto &CTM = I.second.getCallTargets();
+    if (CTM.size() == 1) {
+      StringRef CalleeName = CTM.begin()->first();
----------------
wenlei wrote:
> davidxl wrote:
> > hoy wrote:
> > > Please add a comment for this. Basically we are filtering out possible indirect calls.
> > indirect call sites are good anchors too. Why not use sequential id based anchoring?
> The call site matching is based on callee names, indirect call from IR doesn't have callee name, and we opt to be conservative here (using name, instead of sequential id). 
Commnets added


================
Comment at: llvm/lib/Transforms/IPO/SampleProfile.cpp:2192
+// Matching heuristic: Match the anchor(with same name) from left to right,
+// first come first matched, non-anchor location match is based on the offset to
+// the last matched anchor.
----------------
hoy wrote:
> May be worth mentioning the matching is based on sequential lexical order. "First come" means the anchors with lexically early locations.
Thanks for the suggestion, refined the comments


================
Comment at: llvm/lib/Transforms/IPO/SampleProfile.cpp:2220
+      StringRef CalleeName = CallIter->second;
+      auto RA = CalleeToCallsitesMap.find(CalleeName);
+      if (RA != CalleeToCallsitesMap.end()) {
----------------
davidxl wrote:
> What does RA stand for?
I wanted to say "Result of the Anchor", changed to a more meaningful name `ProfileAnchors`


================
Comment at: llvm/lib/Transforms/IPO/SampleProfile.cpp:2268
     for (auto &I : BB) {
+      if (std::optional<PseudoProbe> Probe = extractProbe(I)) {
+        AllLocations.insert({Probe->Id, 0});
----------------
hoy wrote:
> Should this be conditioned under `FunctionSamples::ProfileIsProbeBased` if the same is going to be used for non-CS profile?
> 
> Also if profile does match, there's no need to populate `AllLocations`?
> Should this be conditioned under FunctionSamples::ProfileIsProbeBased if the same is going to be used for non-CS profile?

Yes, added the ` FunctionSamples::ProfileIsProbeBased ` check.

> Also if profile does match, there's no need to populate AllLocations?
Yes, going the remove the `MatchedCallsiteLocs `, so here only leave one structure IRLocations.


================
Comment at: llvm/lib/Transforms/IPO/SampleProfile.cpp:2269
+      if (std::optional<PseudoProbe> Probe = extractProbe(I)) {
+        AllLocations.insert({Probe->Id, 0});
+      }
----------------
hoy wrote:
> nit: use emplace
Done.


================
Comment at: llvm/lib/Transforms/IPO/SampleProfile.cpp:1844
       LLVM_DEBUG(
           dbgs() << "Profile is invalid due to CFG mismatch for Function "
+                 << F.getName() << "\n");
----------------
wenlei wrote:
> Unrelated to this change, but it may be useful to have a way for users to print out functions with stale profile without needing debug compiler. 
> 
> Often times users ask this: is this change going to invalidate profile for this function? having a switch may help them self-diagnose to get an answer.
Sounds good, I will add a switch in a separate diff.


================
Comment at: llvm/test/Transforms/SampleProfile/Inputs/pseudo-probe-stale-profile-matching.prof:7
+ 5: 0
+ 7: 124 bar:124
+ 9: 126 bar:126
----------------
hoy wrote:
> Add a case for multiple call targets at same location?
Sounds good, added a line including multiple call targets


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147545/new/

https://reviews.llvm.org/D147545



More information about the llvm-commits mailing list