[PATCH] D152399: [CodeGen] Fine tune MachineFunctionSplitPass (MFS) for FSAFDO.

Tue Jun 20 15:40:24 PDT 2023

shenhan added a comment.

In D152399#4410073 <https://reviews.llvm.org/D152399#4410073>, @snehasish wrote:

>> I'll remove the testcase from this CL
>
> Actually I would prefer the foo4 based test case in this patch and in the same file too (`test/CodeGen/X86/machine-function-splitter.ll`). Also added a comment in the other patch D152577 <https://reviews.llvm.org/D152577>. Thanks!

Thanks. I see. Done merging my test into machine-function-splitter.ll.

================
Comment at: llvm/include/llvm/CodeGen/Passes.h:67
+  MachineFunctionPass *
+  createMachineFunctionSplitterPass(bool HasAccurateSampleProfile = false);

----------------
wenlei wrote:
> nit: name it `HasAccurateProfile` since it covers IRPGO too. 
Done.

================
Comment at: llvm/lib/CodeGen/MachineFunctionSplitter.cpp:118

   if (PercentileCutoff > 0) {
     return PSI->isColdCountNthPercentile(PercentileCutoff, *Count);
----------------
hoy wrote:
> Looks like this is the logic the `HasAccurateSampleProfile` check is bypassing. Is the `PercentileCutoff` count is too high in your case? IIUC, you want more blocks to be treated as cold to lower the splitting cost. A higher cutoff count should help. Am I missing anything?
> 
> BTW, a more accurate profile should be more trustful, and thus the existing logic should work better. The term `HasAccurateSampleProfile` to bypass the existing logic here is a bit confusing to me.
Yes, you are right. Sorry, this is a bit confusing. The origin name was IsUsingFSAFDO, and was later renamed to HasAccurateSampleProfile per reviewer's suggestion. And it should equal to the negated value of IsUsingFSAFDO.

So replaced HasAccurateProfile with !HasAccurateProfile and negated the initial value of HasAccurateProfile and updated the comment.

================
Comment at: llvm/lib/CodeGen/MachineFunctionSplitter.cpp:165
+          new BranchProbabilityInfo(MF.getFunction(), LI);
+      BlockFrequencyInfo BFI(MF.getFunction(), *BPI, LI);
+      CandidateForSplit =
----------------
wenlei wrote:
> We don't need to go through BPI->BFI all over again just to use `PSI->isFunctionHotInCallGraph`, which is more expensive. Instead, we can implement this directly with MBFI. 
> 
> MachineSizeOpts has the same need and has implemented `machine_size_opts_detail::isFunctionColdInCallGraph`, `machine_size_opts_detail::isFunctionHotInCallGraphNthPercentile` already. Now that we have a second use, it makes sense to refactor these into common MIR hotness helpers. 
This is being addressed in D152758. I'll keep it as is for now (after addressing Snehasish's comments). Once D152758 is submitted, I'll change this accordingly.

================
Comment at: llvm/lib/CodeGen/MachineFunctionSplitter.cpp:111
+  if (IsFSAFDOFlavor)
+    return Count.has_value() && Count.value() == 0;
+
----------------
wenlei wrote:
> shenhan wrote:
> > shenhan wrote:
> > > wenlei wrote:
> > > > snehasish wrote:
> > > > > Can we use `< ColdCountThreshold` instead of `== 0`?
> > > > > 
> > > > > `ColdCountThreshold` defaults to 1 so it should have the same effect.
> > > > So for IRPGO, we want to treat unknown as zero/cold, but not so for FSAFDO? Why is the difference? 
> > > > 
> > > > IIRC after profile inference, not sampled block will still get zero counts. What produces `Count.has_value() == false`?
> > > > So for IRPGO, we want to treat unknown as zero/cold, but not so for FSAFDO? Why is the difference?
> > > 
> > > Yes, the reason is that when IRPGO dictates a zero block counter, we trust it with higher confidence, but less so for FSAFDO.  The latter could be a miss sampling or inference / propagation error.  Since the cost of splitting a non-cold block is high, we only split when we are sure of the block counter.
> > > 
> > > > IIRC after profile inference, not sampled block will still get zero counts. What produces Count.has_value() == false?
> > > You are right. This is an extra caution in (unlikely) case when Count has no value and we don't want an exception to be thrown.
> > Done.
> > Yes, the reason is that when IRPGO dictates a zero block counter, we trust it with higher confidence, but less so for FSAFDO. The latter could be a miss sampling or inference / propagation error.  
> 
> But checking `has_value` isn't going to be able to differentiate inaccurate counts. 
> 
> > You are right. This is an extra caution in (unlikely) case when Count has no value and we don't want an exception to be thrown. 
> 
> In this case, we don't need to guard it under `HasAccurateProfile` as it applies to IRPGO as well.
Thanks. I made the following change:

Firstly, per hoy's comment, I replaced "if (HasAccurateProfile) {" with "if (!HasAccurateProfile) {", noticing that HasAccurateProfile should be False when using FSAFDO or CSSPGO, and True when using IRPGO. Previously the value is negated.

Secondly, when !HasAccurateProfile && !Count.has_value()  (although the latter is unlikely), this needs to return false. Whereas when HasAccurateProfile  && !Count.has_value(), this returns true. So I have to guard it with HasAccureateProfile.

Thirdly, the next line "if (!Count)" is not testing "if (Count !=0)", it is testing "if(!Count.has_value())", it is little bit misleading, so I changed it to if(!Count.has_value()), so it is clear why this is guarded under HasAccurateProfile.

Hope this is clearer.

================
Comment at: llvm/lib/CodeGen/MachineFunctionSplitter.cpp:155
   ProfileSummaryInfo *PSI = nullptr;
+  bool candidate_for_split = true;
   if (UseProfileData) {
----------------
wenlei wrote:
> snehasish wrote:
> > shenhan wrote:
> > > snehasish wrote:
> > > > This should be `CandidateForSplit`.
> > > > 
> > > > Also since if CandidateForSplit is false, we can just return early from L166 since we are not going to split anything in this function. Then you can remove the use of this var from the pass altogether?
> > > > This should be CandidateForSplit.
> > > 
> > > Ah, yes.
> > > 
> > > > Then you can remove the use of this var from the pass altogether?
> > > Maybe no, because this pass later also splits ALL eh code later.
> > > 
> > > 
> > > ```
> > > if (SplitAllEHCode)
> > >     setDescendantEHBlocksCold(MF);
> > > ```
> > Hmm, I wonder if we can restructure this to simplify the conditions. I'll respond here if I can think of something.
> Can we hoist (or tail dup) `setDescendantEHBlocksCold` and then early return? That would make the code cleaner.
Done by tail-duping "setDescendantEHBlockCold" with extracted helper function "finishAdjustingBasicBLocksAndLandingPads".

================
Comment at: llvm/lib/CodeGen/MachineFunctionSplitter.cpp:155
   ProfileSummaryInfo *PSI = nullptr;
+  bool candidate_for_split = true;
   if (UseProfileData) {
----------------
shenhan wrote:
> wenlei wrote:
> > snehasish wrote:
> > > shenhan wrote:
> > > > snehasish wrote:
> > > > > This should be `CandidateForSplit`.
> > > > > 
> > > > > Also since if CandidateForSplit is false, we can just return early from L166 since we are not going to split anything in this function. Then you can remove the use of this var from the pass altogether?
> > > > > This should be CandidateForSplit.
> > > > 
> > > > Ah, yes.
> > > > 
> > > > > Then you can remove the use of this var from the pass altogether?
> > > > Maybe no, because this pass later also splits ALL eh code later.
> > > > 
> > > > 
> > > > ```
> > > > if (SplitAllEHCode)
> > > >     setDescendantEHBlocksCold(MF);
> > > > ```
> > > Hmm, I wonder if we can restructure this to simplify the conditions. I'll respond here if I can think of something.
> > Can we hoist (or tail dup) `setDescendantEHBlocksCold` and then early return? That would make the code cleaner.
> Done by tail-duping "setDescendantEHBlockCold" with extracted helper function "finishAdjustingBasicBLocksAndLandingPads".
Done by tail-duping "setDescendantEHBlockCold" with extracted helper function "finishAdjustingBasicBLocksAndLandingPads".

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D152399/new/

https://reviews.llvm.org/D152399