[PATCH] D147545: [CSSPGO] Stale profile matching(part 2)

Lei Wang via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Apr 20 12:02:26 PDT 2023


wlei added inline comments.


================
Comment at: llvm/lib/Transforms/IPO/SampleProfile.cpp:2289
+        if (std::optional<PseudoProbe> Probe = extractProbe(I))
+          IRLocations.emplace(LineLocation(Probe->Id, 0), StringRef());
+      }
----------------
wenlei wrote:
> wlei wrote:
> > hoy wrote:
> > > I guess `IRLocations` should also be populated for non-CS. Maybe add a TODO?
> > TODO added, Yes, currently only support pseudo probe.
> Looks like we're not populating non-call locations for AutoFDO? In that case, should we assert on CSSPGO to make sure we don't accidentally run this for AutoFDO before its support is complete?
Yes, now it's not supported for AutoFDO. It's already under `FunctionSamples::ProfileIsProbeBased`, so this code won't be run in AutoFDO.


================
Comment at: llvm/lib/Transforms/IPO/SampleProfile.cpp:2304
+        // Force to overwrite the callee name in case any non-call location was
+        // wrote before.
+        IRLocations[IRCallsite] = CalleeName;
----------------
hoy wrote:
> wenlei wrote:
> > hoy wrote:
> > > wlei wrote:
> > > > hoy wrote:
> > > > > typo: wrote -> written
> > > > > 
> > > > > BTW, have you seen such collision? With pseudo probes this shouldn't happen. It should not happen with dwarf discriminators either.
> > > > Before I didn't add `isa<IntrinsicInst>(&I)` condition to `(std::optional<PseudoProbe> Probe = extractProbe(I))`
> > > > I saw some callee name become empty due to the overwriting by the above emplace.
> > > > 
> > > > So I was thinking this is to record all the anchors, the anchor should always be higher priority than non-anchor, so I changed like here to force the writing. Also in case any changes when we support AutoFDO or post-link time matching. Or we can use the assertion here.
> > > > 
> > > I see. Yeah, `extractProbe` on call instruction gets the call probe. If you want block probes only, you can change the check `isa<IntrinsicInst>` to `isa<PseudoProbeInst>`.
> > > 
> > > 
> > For non-call location, the name should be empty. Maybe assert that we never overwrite non-empty name? 
> An assert sounds good for pseudo probes. The overwrite is possible for autofdo without enabling dwarf discriminator. 
Sounds good, changed to use `isa<PseudoProbeInst>. and added the assert for pseudo probe.


================
Comment at: llvm/lib/Transforms/IPO/SampleProfile.cpp:2334
+
+  if (IsFuncHashMismatch && SalvageStaleProfile) {
+    LLVM_DEBUG(dbgs() << "Run stale profile matching for " << F.getName()
----------------
hoy wrote:
> hoy wrote:
> > wenlei wrote:
> > > wlei wrote:
> > > > hoy wrote:
> > > > > BTW, wondering if you've ever seen mismatched callsites when function hash matches. The hash counts number of callsites but not their orders.
> > > > Good question. Yes, there are many mismatched callsites even hash is matched, current work only support when a checksum mismatch is detected. 
> > > > 
> > > > There is a general issue whether we can turn it on for all the functions. That is whether the matching algorithm can handle perfectly with the non-stale profile.  the current heuristic is "first come first match", but not all the functions are in the profile(supposing there are functions doesn't hit any samples), it could give inconsistent anchors for the non-stale profile then cause a mismatch.
> > > > 
> > > > In order to solve it, I think we can try:
> > > > 
> > > > - Use a more strict checksum, like also count the orders.
> > > > - find a threshold from the mismatch metrics to control it.
> > > > -  Use a different heuristic, like search the closest location which can handle well with non-stale profile, but need more measuring for the mismatched function.
> > > > 
> > > > This is also an issue blocking AutoFDO, since AutoFDO doesn't have the checksum.
> > > > 
> > > > 
> > > > 
> > > it might be useful to have a debug print, or `STATISTIC` so we know when and how often this happens. It's essentially hash collision. 
> > Good point. Counting how many direct callsites having mismatched targets in the profile and on the IR would be helpful.
> Also as discussed offline, since IR callsites are matched with profile callsites sequentially, it may be interesting to see how many same-named callsites are not able to get a match. If it's non-trivial, adding zero-count callsites in the profile may allow for a more precise match. Of course that's going to increase profile size, but just curious if it is worth.
> it might be useful to have a debug print, or STATISTIC so we know when and how often this happens. It's essentially hash collision.
We already have the global statistic report for the mismatched callsite(the `TotalProfiledCallsites` under `ReportProfileStaleness`), added a debug print for the function level debugging.

>Also as discussed offline, since IR callsites are matched with profile callsites sequentially, it may be interesting to see how many same-named callsites are not able to get a match. If it's non-trivial, adding zero-count callsites in the profile may allow for a more precise match. Of course that's going to increase profile size, but just curious if it is worth.
Yeah, we could do a offline analysis for how many callsites whose name can be found in the profile but still remain mismatched after matching. We could do add the zero-count calliste in llvm-profgen, only for top-level is enough, I guess that size should be ok with extbinary.

Alternatively, we could use "search closest location" heuristic so that for a fresh profile where the profile and the IR should be the same location, the IR location can always be matched to the same location in the Profile.





================
Comment at: llvm/lib/Transforms/IPO/SampleProfile.cpp:2262-2264
+  std::set<LineLocation> AllLocations;
+  // Direct call's callee name will be used as the matching anchor.
+  std::unordered_map<LineLocation, StringRef, LineLocationHash> DirectCallSites;
----------------
wenlei wrote:
> wlei wrote:
> > wenlei wrote:
> > > wlei wrote:
> > > > wenlei wrote:
> > > > > Wondering if we can simplify by merging the two containers, i.e use `std::map<LineLocation, StringRef>`, and empty StringRef value for non-direct-callsite, then we avoid the 2nd map, and also avoid the map look up in `runStaleProfileMatching`. It may use slightly more memory, but feels a bit cleaner. 
> > > > > 
> > > > > Or even something like `std::map<LineLocation, std::pair<StringRef, bool>>` to have `MatchedCallsiteLocs` merged in as well. These containers feel a bit duplicated.
> > > > Good point!  going to remove the `MatchedCallsiteLocs `, so changed to use `std::map<LineLocation, StringRef>` 
> > > Looks like MatchedCallsiteLocs is still there? 
> > Sorry for confusion, it will also be in a separate diff, same diff with checking big profile change. I plan to use the IRLocations to directly match profile locations, there will be no this `MatchedCallsiteLocs` and use it to guid if the mismatch is big to drop or not. 
> Sounds good to deal with it in a separate patch. For `IRLocations`, would be good to have a comment mentioning the use of empty StringRef non-direct-call site. 
comment added


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147545/new/

https://reviews.llvm.org/D147545



More information about the llvm-commits mailing list