[PATCH] D94001: [CSSPGO] Call site prioritized BFS inlining for sample PGO

Tue Jan 12 22:58:43 PST 2021

wenlei added a comment.

In D94001#2494594 <https://reviews.llvm.org/D94001#2494594>, @wmi wrote:

> Sorry for the delay in review. It makes a lot of sense to have a priority based inliner for CSSPGO since its profile annotation doesn't rely on replaying the inlining. But I don't understand why we rely on BFS/DFS strategy to expose the hottest callsite for priority based inliner. In my mind, CSSPGO can know the hotness of each callsite with full context information. Can we just sort all the callsites  based on CSSPGO profile then try to inline the hottest callsite under a specific context first in a top down manner? We may actually need to inline some parent callsites before we can inline the hottest callsite, but it is all driven by the target to inline the hottest callsite first. If we worry some callsite is too deep and we need to inline through a deep path before we can inline the specific callsite, we may add the depth into priority computation.  What do you think?

Yeah, that's one step further than what I have in this patch. The key here is the prioritized work list, and BFS is just a natural by product of using the priority queue. I called out BFS to emphasize the more balanced inlining, but it's not super accurate because it is only a true BFS when priorities are all identical. What you suggested is essentially increasing the scope of call sites to prioritize - currently it's the immediate callees. I agreed that increasing the scope to multiple levels or the entire sub call tree may be beneficial. But I don't think it conflicts the approach in the patch, instead it can be added leveraging this implementation.

With context profiles organized in a trie, it's relatively easy to get the hottest call sites in sub call tree (assuming no stale profile). I think we can assign total order/priority for these hot call sites as well as the call sites in the middle leading to those hot ones (call sites that lead to hottest call site need to have their priority bumped up), then the priority queue will lead us to these hottest call sites first even if they're several levels away..   I can give it a try to increase the scope of call site for prioritization as a follow up.

We need to be able to priority call site before its IR is available though. Originally I was thinking about also taking cost/size into account in priority in addition to call site hotness (don't have that implemented yet), to do that we need the actual call site IR in addition to profile. I was also hoping to eventually unify the early inliner for AFDO and CSSPGO. This implementation didn't diverge from AFDO too far, and we should be able to merge the two in the future with just different priority assignment (equal priority for AFDO). But I don't think these two are critical.

================
Comment at: llvm/lib/Transforms/IPO/SampleProfile.cpp:426
+  bool getInlineCandidate(InlineCandidate *NewCandidate, CallBase *CB);
+  bool inlineCandidate(InlineCandidate &Candidate,
+                       SmallVector<CallBase *, 8> &InlinedCallSites);
----------------
wmi wrote:
> Since there is another Struct named as InlineCandidate, can we rename it to doInline or executeInline?
Good point, renamed to `tryInlineCandidate` to be consistent with `tryPromoteIndirectCall` and `shouldInlineCandidate`. This function may decided not to inline. 

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D94001/new/

https://reviews.llvm.org/D94001