[PATCH] D152399: [CodeGen] Fine tune MachineFunctionSplitPass (MFS) for FSAFDO.

Wed Jun 7 16:37:34 PDT 2023

wenlei added a subscriber: hoy.
wenlei added a comment.

> we see a 1.2% performance improvement in clang benchmark, 0.9% QPS improvement in our internal search benchmark, and 3%-5% improvement in internal storage benchmark.

This looks promising. We saw improvements from MFS when used with CSSPGO too, but the improvements were smaller than the numbers here. You are not using Propeller or BOLT in your perf testing above, right?

What is the baseline in your experiments? are you comparing 1) FS-AFDO w/ MFS vs FS-AFDO w/o MFS, all using this change; or comparing 2) after vs before this patch, using FS-AFDO + MFS on both sides?

+ at hoy @wlei

================
Comment at: llvm/lib/CodeGen/MachineFunctionSplitter.cpp:90
+  // Whether this pass is using FSAFDO profile or not.
+  bool IsFSAFDOFlavor = false;
 };
----------------
It'd be better to not couple this with FS-AFDO. IIUC, the key here is profile quality, FS-AFDO isn't really special other than having better profile quality comparing to baseline AFDO. If we want to introduce different heuristic based on profile quality, make it explicit, i.e. use something like `HasAccurateSampleProfile` instead of `IsFSAFDOFlavor`. 

Ideally it'd be best if we can unify FS-AFDO case with IRPGO, so we can use a single flag `HasAccurateProfile` for all heuristic tweaks for FS-AFDO, IRPGO and CSSPGO.. Maybe this should all be incorporated into PSI instead of having flags spread around.

Note that we also see performance improvements from MFS using CSSPGO, and we're going to turn it on for internal use (all with CSSPGO).  

================
Comment at: llvm/lib/CodeGen/MachineFunctionSplitter.cpp:111
+  if (IsFSAFDOFlavor)
+    return Count.has_value() && Count.value() == 0;
+
----------------
snehasish wrote:
> Can we use `< ColdCountThreshold` instead of `== 0`?
> 
> `ColdCountThreshold` defaults to 1 so it should have the same effect.
So for IRPGO, we want to treat unknown as zero/cold, but not so for FSAFDO? Why is the difference? 

IIRC after profile inference, not sampled block will still get zero counts. What produces `Count.has_value() == false`?

================
Comment at: llvm/lib/CodeGen/MachineFunctionSplitter.cpp:159
     PSI = &getAnalysis<ProfileSummaryInfoWrapperPass>().getPSI();
+    if (IsFSAFDOFlavor) {
+      const LoopInfo *LI = &getAnalysis<LoopInfoWrapperPass>().getLoopInfo();
----------------
Why this heuristic isn't applicable to IRPGO profile? 

================
Comment at: llvm/lib/CodeGen/TargetPassConfig.cpp:1288
+    addPass(createMIRAddFSDiscriminatorsPass(
+        sampleprof::FSDiscriminatorPass::PassLast));
+
----------------
Pass3 discriminator is already later than PassLast discriminator before this change, so all code duplication that was covered before the change should already be covered with Pass3 discriminator, do we need another even later PassLast discriminator here?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D152399/new/

https://reviews.llvm.org/D152399