[PATCH] D152399: [CodeGen] Fine tune MachineFunctionSplitPass (MFS) for FSAFDO.

Sat Jun 10 17:55:31 PDT 2023

wenlei added inline comments.

================
Comment at: llvm/include/llvm/CodeGen/Passes.h:67
+  MachineFunctionPass *
+  createMachineFunctionSplitterPass(bool HasAccurateSampleProfile = false);

----------------
nit: name it `HasAccurateProfile` since it covers IRPGO too. 

================
Comment at: llvm/lib/CodeGen/MachineFunctionSplitter.cpp:90
+private:
+  // Whether this pass is using FSAFDO profile or not.
+  bool HasAccurateSampleProfile = false;
----------------
nit: update the comment

================
Comment at: llvm/lib/CodeGen/MachineFunctionSplitter.cpp:165
+          new BranchProbabilityInfo(MF.getFunction(), LI);
+      BlockFrequencyInfo BFI(MF.getFunction(), *BPI, LI);
+      CandidateForSplit =
----------------
We don't need to go through BPI->BFI all over again just to use `PSI->isFunctionHotInCallGraph`, which is more expensive. Instead, we can implement this directly with MBFI. 

MachineSizeOpts has the same need and has implemented `machine_size_opts_detail::isFunctionColdInCallGraph`, `machine_size_opts_detail::isFunctionHotInCallGraphNthPercentile` already. Now that we have a second use, it makes sense to refactor these into common MIR hotness helpers. 

================
Comment at: llvm/lib/CodeGen/MachineFunctionSplitter.cpp:111
+  if (IsFSAFDOFlavor)
+    return Count.has_value() && Count.value() == 0;
+
----------------
shenhan wrote:
> shenhan wrote:
> > wenlei wrote:
> > > snehasish wrote:
> > > > Can we use `< ColdCountThreshold` instead of `== 0`?
> > > > 
> > > > `ColdCountThreshold` defaults to 1 so it should have the same effect.
> > > So for IRPGO, we want to treat unknown as zero/cold, but not so for FSAFDO? Why is the difference? 
> > > 
> > > IIRC after profile inference, not sampled block will still get zero counts. What produces `Count.has_value() == false`?
> > > So for IRPGO, we want to treat unknown as zero/cold, but not so for FSAFDO? Why is the difference?
> > 
> > Yes, the reason is that when IRPGO dictates a zero block counter, we trust it with higher confidence, but less so for FSAFDO.  The latter could be a miss sampling or inference / propagation error.  Since the cost of splitting a non-cold block is high, we only split when we are sure of the block counter.
> > 
> > > IIRC after profile inference, not sampled block will still get zero counts. What produces Count.has_value() == false?
> > You are right. This is an extra caution in (unlikely) case when Count has no value and we don't want an exception to be thrown.
> Done.
> Yes, the reason is that when IRPGO dictates a zero block counter, we trust it with higher confidence, but less so for FSAFDO. The latter could be a miss sampling or inference / propagation error.  

But checking `has_value` isn't going to be able to differentiate inaccurate counts. 

> You are right. This is an extra caution in (unlikely) case when Count has no value and we don't want an exception to be thrown. 

In this case, we don't need to guard it under `HasAccurateProfile` as it applies to IRPGO as well.

================
Comment at: llvm/lib/CodeGen/MachineFunctionSplitter.cpp:155
   ProfileSummaryInfo *PSI = nullptr;
+  bool candidate_for_split = true;
   if (UseProfileData) {
----------------
snehasish wrote:
> shenhan wrote:
> > snehasish wrote:
> > > This should be `CandidateForSplit`.
> > > 
> > > Also since if CandidateForSplit is false, we can just return early from L166 since we are not going to split anything in this function. Then you can remove the use of this var from the pass altogether?
> > > This should be CandidateForSplit.
> > 
> > Ah, yes.
> > 
> > > Then you can remove the use of this var from the pass altogether?
> > Maybe no, because this pass later also splits ALL eh code later.
> > 
> > 
> > ```
> > if (SplitAllEHCode)
> >     setDescendantEHBlocksCold(MF);
> > ```
> Hmm, I wonder if we can restructure this to simplify the conditions. I'll respond here if I can think of something.
Can we hoist (or tail dup) `setDescendantEHBlocksCold` and then early return? That would make the code cleaner.

================
Comment at: llvm/lib/CodeGen/MachineFunctionSplitter.cpp:159
     PSI = &getAnalysis<ProfileSummaryInfoWrapperPass>().getPSI();
+    if (IsFSAFDOFlavor) {
+      const LoopInfo *LI = &getAnalysis<LoopInfoWrapperPass>().getLoopInfo();
----------------
shenhan wrote:
> wenlei wrote:
> > Why this heuristic isn't applicable to IRPGO profile? 
> Yes, the reasoning is similar to above. For FSAFDO, we have much higher confidence in hot functions where there are plenty of samples, not so if a function only has a few samples. And to be conservative, we only choose to apply split for the former hot functions. Whereas for IRPGO, we trust it when it says this function is cold, and we can proceed to split that function. (Although doing split in cold functions for IRPGO would not  make a lot difference, but still, it may have some benefits.)
Makes sense, would be good to add a comment to explain.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D152399/new/

https://reviews.llvm.org/D152399