[PATCH] D152399: [CodeGen] Fine tune MachineFunctionSplitPass (MFS) for FSAFDO.

Sat Jul 8 19:05:58 PDT 2023

shenhan marked 2 inline comments as done.
shenhan added inline comments.

================
Comment at: llvm/lib/CodeGen/MachineFunctionSplitter.cpp:111
+  if (IsFSAFDOFlavor)
+    return Count.has_value() && Count.value() == 0;
+
----------------
wenlei wrote:
> xur wrote:
> > wenlei wrote:
> > > shenhan wrote:
> > > > davidxl wrote:
> > > > > wenlei wrote:
> > > > > > shenhan wrote:
> > > > > > > wenlei wrote:
> > > > > > > > shenhan wrote:
> > > > > > > > > shenhan wrote:
> > > > > > > > > > wenlei wrote:
> > > > > > > > > > > snehasish wrote:
> > > > > > > > > > > > Can we use `< ColdCountThreshold` instead of `== 0`?
> > > > > > > > > > > > 
> > > > > > > > > > > > `ColdCountThreshold` defaults to 1 so it should have the same effect.
> > > > > > > > > > > So for IRPGO, we want to treat unknown as zero/cold, but not so for FSAFDO? Why is the difference? 
> > > > > > > > > > > 
> > > > > > > > > > > IIRC after profile inference, not sampled block will still get zero counts. What produces `Count.has_value() == false`?
> > > > > > > > > > > So for IRPGO, we want to treat unknown as zero/cold, but not so for FSAFDO? Why is the difference?
> > > > > > > > > > 
> > > > > > > > > > Yes, the reason is that when IRPGO dictates a zero block counter, we trust it with higher confidence, but less so for FSAFDO.  The latter could be a miss sampling or inference / propagation error.  Since the cost of splitting a non-cold block is high, we only split when we are sure of the block counter.
> > > > > > > > > > 
> > > > > > > > > > > IIRC after profile inference, not sampled block will still get zero counts. What produces Count.has_value() == false?
> > > > > > > > > > You are right. This is an extra caution in (unlikely) case when Count has no value and we don't want an exception to be thrown.
> > > > > > > > > Done.
> > > > > > > > > Yes, the reason is that when IRPGO dictates a zero block counter, we trust it with higher confidence, but less so for FSAFDO. The latter could be a miss sampling or inference / propagation error.  
> > > > > > > > 
> > > > > > > > But checking `has_value` isn't going to be able to differentiate inaccurate counts. 
> > > > > > > > 
> > > > > > > > > You are right. This is an extra caution in (unlikely) case when Count has no value and we don't want an exception to be thrown. 
> > > > > > > > 
> > > > > > > > In this case, we don't need to guard it under `HasAccurateProfile` as it applies to IRPGO as well.
> > > > > > > Thanks. I made the following change:
> > > > > > > 
> > > > > > > Firstly, per hoy's comment, I replaced "if (HasAccurateProfile) {" with "if (!HasAccurateProfile) {", noticing that HasAccurateProfile should be False when using FSAFDO or CSSPGO, and True when using IRPGO. Previously the value is negated.
> > > > > > > 
> > > > > > > Secondly, when !HasAccurateProfile && !Count.has_value()  (although the latter is unlikely), this needs to return false. Whereas when HasAccurateProfile  && !Count.has_value(), this returns true. So I have to guard it with HasAccureateProfile.
> > > > > > > 
> > > > > > > Thirdly, the next line "if (!Count)" is not testing "if (Count !=0)", it is testing "if(!Count.has_value())", it is little bit misleading, so I changed it to if(!Count.has_value()), so it is clear why this is guarded under HasAccurateProfile.
> > > > > > > 
> > > > > > > Hope this is clearer.
> > > > > > If the key difference is how we treat unknown/missing counts, I'd suggest let's make it explicit.
> > > > > > 
> > > > > > Would it work if we just make a simple and explicit change:
> > > > > > 
> > > > > > ```
> > > > > > if (!Count)
> > > > > >   return true;
> > > > > > ```
> > > > > > -->
> > > > > > ```
> > > > > > // Treat unknown/missing counts as cold if profile is accurate, but not if profile is inaccurate. 
> > > > > > if (!Count.has_value()) {
> > > > > >   return HasAccurateProfile;
> > > > > > }
> > > > > > ```
> > > > > > If the key difference is how we treat unknown/missing counts, I'd suggest let's make it explicit.
> > > > > > 
> > > > > > Would it work if we just make a simple and explicit change:
> > > > > > 
> > > > > > ```
> > > > > > if (!Count)
> > > > > >   return true;
> > > > > > ```
> > > > > > -->
> > > > > > ```
> > > > > > // Treat unknown/missing counts as cold if profile is accurate, but not if profile is inaccurate. 
> > > > > > if (!Count.has_value()) {
> > > > > >   return HasAccurateProfile;
> > > > > > }
> > > > > > ```
> > > > > 
> > > > > 
> > > > > Sounds reasonable.
> > > > > 
> > > > > Missing count in IRPGO does mean very cold (module not even linked into the binary), but for sample PGO, it means unknown -- e.g. new code added which is not sampled. In this sense, their handling should be different.
> > > > ```
> > > > // Treat unknown/missing counts as cold if profile is accurate, but not if profile is inaccurate. 
> > > > if (!Count.has_value()) {
> > > >   return HasAccurateProfile;
> > > > }
> > > > ```
> > > > I see the point here. However for NonAccurateProfile, we test that Count < ColdCountThreshold, whereas for AccurateProfile, we test that using "PSI->isColdCountNthPercentile(PercentileCutoff, *Count);",the former is more rigorous, resulting in less cold block being split than the HasAccurateProfile. With this difference, we still need to set differentiate these 2 cases by "if (HasAccureateProfile) { } else {}" like below:
> > > > 
> > > > ```
> > > > if (!Count.has_value()) {
> > > >   return HasAccurateProfile;
> > > > }
> > > > 
> > > > if (!HasAccureateProfile) {
> > > >   return *Count < ColdCountThreshold;
> > > > }
> > > > 
> > > > if (PercentileCutoff > 0) {
> > > >   return PSI->isColdCountNthPercentile(PercentileCutoff, *Count);
> > > > }
> > > > return (*Count < ColdCountThreshold);
> > > > // end of function
> > > > ```
> > > > 
> > > > Do you think we shall use the above version or keep the current one?
> > > > 
> > > > 
> > > Can we unify `*Count < ColdCountThreshold` and `PSI->isColdCountNthPercentile(PercentileCutoff, *Count)` between the two cases? Is the different handling really necessary - i.e. without the differentiation, sample PGO with MFS would not perform as good? I would keep it simple/unified unless the different handling is perf critical. 
> > They are just two different parameters used in performance tuning: one local to this file and one from PSI.
> > ColdCountThreshold is local to this file and will not affect other passes.
> > 
> > I think it's harmless to keep as it is just for perf turning point of view.
> For long term maintainability I still hope we can keep things as simple as possible, as unified as possible. 
> 
> It just seems arbitrary to use `*Count < ColdCountThreshold` for sample PGO, and `PSI->isColdCountNthPercentile(PercentileCutoff, *Count)` for IRPGO. 
> 
> If we really want to use different value, we could also use `PSI->iisColdCountNthPercentile` for both, but with different percentile cutoff. 
Revised the code so now it unifies both "HasAccurateProfile" and "no-HasAccurateProfile".

The usage of ColdCountThreshold and PSI under both cases are now combined (keep unchanged).

And now for MFS-on-FSAFDO, we need to pass -mllvm -mfs-percentile-cutoff=0 to keep the original logic unchanged.

================
Comment at: llvm/lib/CodeGen/MachineFunctionSplitter.cpp:116
+  if (!Count) {
+    // Early return when Count has no value.
     return true;
----------------
snehasish wrote:
> Maybe merge this comment with the one above and drop the braces?
Revised the code and now it is more compact, and both "HasAccurateProfile" and no-AccurateProfile are using the same logic.
Comment is also revised.

================
Comment at: llvm/test/CodeGen/X86/machine-function-splitter.ll:16
 ; MFS-DEFAULTS-NEXT:  callq   baz
+
   br i1 %0, label %2, label %4, !prof !17
----------------
snehasish wrote:
> Leftover empty line after moving the code?
Deleted line.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D152399/new/

https://reviews.llvm.org/D152399