[PATCH] D152399: [CodeGen] Fine tune MachineFunctionSplitPass (MFS) for FSAFDO.

Thu Jun 8 15:31:30 PDT 2023

shenhan added a comment.

> What is the baseline in your experiments? are you comparing 1) FS-AFDO w/ MFS vs FS-AFDO w/o MFS, all using this change; or comparing 2) after vs before this patch, using FS-AFDO + MFS on both sides?

Thanks. I was comparing "1) FS-AFDO w/ MFS vs FS-AFDO w/o MFS". And FS-AFDO profile is collected via -improved-fs-discriminator=true.

================
Comment at: llvm/lib/CodeGen/MachineFunctionSplitter.cpp:90
+  // Whether this pass is using FSAFDO profile or not.
+  bool IsFSAFDOFlavor = false;
 };
----------------
wenlei wrote:
> It'd be better to not couple this with FS-AFDO. IIUC, the key here is profile quality, FS-AFDO isn't really special other than having better profile quality comparing to baseline AFDO. If we want to introduce different heuristic based on profile quality, make it explicit, i.e. use something like `HasAccurateSampleProfile` instead of `IsFSAFDOFlavor`. 
> 
> Ideally it'd be best if we can unify FS-AFDO case with IRPGO, so we can use a single flag `HasAccurateProfile` for all heuristic tweaks for FS-AFDO, IRPGO and CSSPGO.. Maybe this should all be incorporated into PSI instead of having flags spread around.
> 
> Note that we also see performance improvements from MFS using CSSPGO, and we're going to turn it on for internal use (all with CSSPGO).  
Thanks, this is sensible suggestion. 

> Ideally it'd be best if we can unify FS-AFDO case with IRPGO
Yes, agree. However, currently, there is still a gap between FSAFDO and IRPGO quality, so the need for a flag to separate these 2.

Replaced with HasAccurateSampleProfile.

================
Comment at: llvm/lib/CodeGen/MachineFunctionSplitter.cpp:111
+  if (IsFSAFDOFlavor)
+    return Count.has_value() && Count.value() == 0;
+
----------------
wenlei wrote:
> snehasish wrote:
> > Can we use `< ColdCountThreshold` instead of `== 0`?
> > 
> > `ColdCountThreshold` defaults to 1 so it should have the same effect.
> So for IRPGO, we want to treat unknown as zero/cold, but not so for FSAFDO? Why is the difference? 
> 
> IIRC after profile inference, not sampled block will still get zero counts. What produces `Count.has_value() == false`?
> So for IRPGO, we want to treat unknown as zero/cold, but not so for FSAFDO? Why is the difference?

Yes, the reason is that when IRPGO dictates a zero block counter, we trust it with higher confidence, but less so for FSAFDO.  The latter could be a miss sampling or inference / propagation error.  Since the cost of splitting a non-cold block is high, we only split when we are sure of the block counter.

> IIRC after profile inference, not sampled block will still get zero counts. What produces Count.has_value() == false?
You are right. This is an extra caution in (unlikely) case when Count has no value and we don't want an exception to be thrown.

================
Comment at: llvm/lib/CodeGen/MachineFunctionSplitter.cpp:111
+  if (IsFSAFDOFlavor)
+    return Count.has_value() && Count.value() == 0;
+
----------------
shenhan wrote:
> wenlei wrote:
> > snehasish wrote:
> > > Can we use `< ColdCountThreshold` instead of `== 0`?
> > > 
> > > `ColdCountThreshold` defaults to 1 so it should have the same effect.
> > So for IRPGO, we want to treat unknown as zero/cold, but not so for FSAFDO? Why is the difference? 
> > 
> > IIRC after profile inference, not sampled block will still get zero counts. What produces `Count.has_value() == false`?
> > So for IRPGO, we want to treat unknown as zero/cold, but not so for FSAFDO? Why is the difference?
> 
> Yes, the reason is that when IRPGO dictates a zero block counter, we trust it with higher confidence, but less so for FSAFDO.  The latter could be a miss sampling or inference / propagation error.  Since the cost of splitting a non-cold block is high, we only split when we are sure of the block counter.
> 
> > IIRC after profile inference, not sampled block will still get zero counts. What produces Count.has_value() == false?
> You are right. This is an extra caution in (unlikely) case when Count has no value and we don't want an exception to be thrown.
Done.

================
Comment at: llvm/lib/CodeGen/MachineFunctionSplitter.cpp:155
   ProfileSummaryInfo *PSI = nullptr;
+  bool candidate_for_split = true;
   if (UseProfileData) {
----------------
snehasish wrote:
> This should be `CandidateForSplit`.
> 
> Also since if CandidateForSplit is false, we can just return early from L166 since we are not going to split anything in this function. Then you can remove the use of this var from the pass altogether?
> This should be CandidateForSplit.

Ah, yes.

> Then you can remove the use of this var from the pass altogether?
Maybe no, because this pass later also splits ALL eh code later.

```
if (SplitAllEHCode)
    setDescendantEHBlocksCold(MF);
```

================
Comment at: llvm/lib/CodeGen/MachineFunctionSplitter.cpp:159
     PSI = &getAnalysis<ProfileSummaryInfoWrapperPass>().getPSI();
+    if (IsFSAFDOFlavor) {
+      const LoopInfo *LI = &getAnalysis<LoopInfoWrapperPass>().getLoopInfo();
----------------
wenlei wrote:
> Why this heuristic isn't applicable to IRPGO profile? 
Yes, the reasoning is similar to above. For FSAFDO, we have much higher confidence in hot functions where there are plenty of samples, not so if a function only has a few samples. And to be conservative, we only choose to apply split for the former hot functions. Whereas for IRPGO, we trust it when it says this function is cold, and we can proceed to split that function. (Although doing split in cold functions for IRPGO would not  make a lot difference, but still, it may have some benefits.)

================
Comment at: llvm/lib/CodeGen/MachineFunctionSplitter.cpp:160
+    if (IsFSAFDOFlavor) {
+      const LoopInfo *LI = &getAnalysis<LoopInfoWrapperPass>().getLoopInfo();
+      const BranchProbabilityInfo *BPI =
----------------
snehasish wrote:
> `const LoopInfo& LI`?
> No need to take the address and then deref it in each use below.
Done.

================
Comment at: llvm/lib/CodeGen/MachineFunctionSplitter.cpp:162
+      const BranchProbabilityInfo *BPI =
+          new BranchProbabilityInfo(MF.getFunction(), *LI);
+      BlockFrequencyInfo *BFI =
----------------
snehasish wrote:
> I think this new (and the one below) is leaking memory. 
> 
> Maybe just use vars on the stack like --
> ```
> const BranchProbabilityInfo BPI(MF.getFunction(), LI);
> BlockFrequencyInfo BFI(MF.getFunction(), BPI, LI);
> ```
Done.

================
Comment at: llvm/lib/CodeGen/TargetPassConfig.cpp:1253

+  if (EnableFSDiscriminator)
+    addPass(createMIRAddFSDiscriminatorsPass(
----------------
snehasish wrote:
> I think the changes to TargetPassConfig should be split out into a separate patch. Also needs a test to ensure that the additional discriminator passes are run where we expect them to be run.
OK. Removed the file from this CL.

================
Comment at: llvm/lib/CodeGen/TargetPassConfig.cpp:1288
+    addPass(createMIRAddFSDiscriminatorsPass(
+        sampleprof::FSDiscriminatorPass::PassLast));
+
----------------
wenlei wrote:
> Pass3 discriminator is already later than PassLast discriminator before this change, so all code duplication that was covered before the change should already be covered with Pass3 discriminator, do we need another even later PassLast discriminator here?
You are right, no need for another round of discriminator here once we hve Pass3. Removed. 

Also, per Snehasish's suggest, I'll move the file to another CL.

================
Comment at: llvm/test/CodeGen/X86/fsafdo_mfs_test.ll:1
+; RUN: llc -enable-fs-discriminator -improved-fs-discriminator=true -split-machine-functions < %s 2>&1 | FileCheck %s --check-prefixes=MFS
+; RUN: llc -enable-fs-discriminator -improved-fs-discriminator=true < %s 2>&1 | FileCheck %s --check-prefixes=NOMFS
----------------
snehasish wrote:
> I think we can write a smaller test by reusing the code in test/CodeGen/X86/machine-function-splitter.ll. The only thing we need to check there is that if `-enable-fs-discriminator` and `-improved-fs-discriminator=true` then the cold count which has value and that value is 0 is split. 
> 
> The closest test case is foo4 on L65 in machine-function-splitter.ll. I would suggest copying that and adding the necessary check.
> 
> 
> 
Thanks. That's good suggestion. I'll remove the testcase from this CL because it has to be together with changes to TargetPassConfig.cpp (so the flag "UsingAccurateSampleProfile" can be passed to MFS pass). And include it with TargetPassConfig.cpp in a new CL.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D152399/new/

https://reviews.llvm.org/D152399