[PATCH] D125023: [CSSPGO][Preinliner] Use linear threshold to drive inline decision.

Fri May 6 10:13:12 PDT 2022

hoy added inline comments.

================
Comment at: llvm/tools/llvm-profgen/CSPreInliner.cpp:173
+    // Linearly adjust threshold based on call site hotness
+    auto Range = Summary->getMaxCount() - ColdCountThreshold;
+    double Position =
----------------
wenlei wrote:
> hoy wrote:
> > wenlei wrote:
> > > hoy wrote:
> > > > wenlei wrote:
> > > > > One concern about using `getMaxCount` -- that will have more instability and variation run to run. How about using HotCountThreshold (or N * HotCountThresold, or a set percentile) to stabilize?
> > > > > 
> > > > > The `NormalizedHotness` doesn't have to be between 0 and 1. Or if we want it to be between 0 and 1, we can also do `min(NormalizedHotness, 1)`.
> > > > > (or N * HotCountThresold, or a set percentile) to stabilize?
> > > > 
> > > > You mean something like Position = N* HotCountThresold ? BTW, what is N? 
> > > > 
> > > > Using a percentile should have a better stability. Position is similar to (1-percentile).  
> > > > 
> > > > 
> > > > BTW, what is N?
> > > 
> > > N can be arbitrary constant, depending on the range we want to normalize. Some random example: (3 * 99% count), or (2 * 95% count)
> > > 
> > > > You mean something like Position = N* HotCountThresold ? 
> > > 
> > > No, just replace getMaxCount with HotCountThresold or a precentile count.
> > > 
> > > > Using a percentile should have a better stability. Position is similar to (1-percentile).
> > > 
> > > If you use (3 * 99% count) as the upper bound of range, then for count = (5 * 99% count), normalized hotness will be larger than 1, unless we manually cap it. 
> > I see. Yeah, choosing a value smaller than getMaxCount may need caping to avoid threshold going ridiculous. But capping means values above the cap won't be treated linearly.
> > 
> > Also positioning based on HotCountThresold makes it service-dependent again. The value at a given cutoff could be very different.  
> > But capping means values above the cap won't be treated linearly. 
> 
> That's why we can use a factor N, or simply use a different percentile, say 20%, which should be very large, and still much more stable than max. 
> 
> > Also positioning based on HotCountThresold makes it service-dependent again. The value at a given cutoff could be very different. 
> 
> How is getMaxCount not service dependent then? 
I see, making it very large makes sense. 20% or 10% or should be close to getMaxCount while be more stable.

As long as the value is big enough to cover most samples, it's not service-dependent. I was concerning about the samples that are not covered or not getting a linear threshold.

================
Comment at: llvm/tools/llvm-profgen/CSPreInliner.cpp:176-179
+    double Multiplier = PreInlineHotCallsiteThresholdMultiplier * Position;
+    SampleThreshold = SampleHotCallSiteThreshold * Multiplier +
+                      SampleColdCallSiteThreshold +
+                      PreInlineHotCallsiteThresholdConstant;
----------------
wenlei wrote:
> hoy wrote:
> > wenlei wrote:
> > > hoy wrote:
> > > > wenlei wrote:
> > > > > How about we simplify the knobs, i.e. remove flag `preinline-hot-callsite-threshold-multiplier` and `preinline-hot-callsite-threshold-constant`. 
> > > > > 
> > > > > Make `Multiplier` a constant, and we can still effectively tune the multiplier via setting `sample-profile-hot-inline-threshold`.
> > > > > Make `PreInlineHotCallsiteThresholdConstant` a constant and we can still effectively tune it via `sample-profile-cold-inline-threshold`. We could also consider setting `SampleColdCallSiteThreshold = 1` in PreInliner (current we set it to 0).
> > > > > 
> > > > > With the above, I think if we can simply switches (avoid adding new ones) without scarifying flexibility for tuning. 
> > > > > Make Multiplier a constant, and we can still effectively tune the multiplier via setting sample-profile-hot-inline-threshold.
> > > > 
> > > > This should work since SampleHotCallSiteThreshold is only used here. One thing is the hot threshold may look very different from what the compiler uses, because of the multiplier. We could use the same heuristics on the compiler as well.
> > > > 
> > > > > Make PreInlineHotCallsiteThresholdConstant a constant and we can still effectively tune it via sample-profile-cold-inline-threshold
> > > > 
> > > > This may also affect cold contexts. Enlarging `sample-profile-cold-inline-threshold` , say from 0 to 5, may result in a lot more cold contexts in the profile.  This is because we currently treat cold functions as having the same hotness, and only use the cold threshold for them instead of computing a linear threshold. 
> > > > One thing is the hot threshold may look very different from what the compiler uses, because of the multiplier. We could use the same heuristics on the compiler as well.
> > > 
> > > I think it's ok for compiler and preinliner to have different thresholds, because one is based on IR, and the other is based on different proxy. 
> > > 
> > > > This may also affect cold contexts. Enlarging sample-profile-cold-inline-threshold , say from 0 to 5, may result in a lot more cold contexts in the profile. This is because we currently treat cold functions as having the same hotness, and only use the cold threshold for them instead of computing a linear threshold.
> > > 
> > > Maybe you didn't understand what I said. What I was suggesting is basically this `+ SampleColdCallSiteThreshold + PreInlineHotCallsiteThresholdConstant` -> `+ SampleColdCallSiteThreshold`. Then we tune `SampleColdCallSiteThreshold` alone, which is equivalent. I didn't find other use of `SampleColdCallSiteThreshold` in llvm-profgen. 
> > `SampleColdCallSiteThreshold` is also used around line 170:
> > 
> >     if (Candidate.CallsiteCount <= ColdCountThreshold)
> >        SampleThreshold = SampleColdCallSiteThreshold;
> > 
> > I didn't choose to go linearly for every sample. Cold samples share the same threshold for now. I did this because cold samples are many and size-sensitive, changing `SampleColdCallSiteThreshold` from 0 to 1 can cause profile grow a lot. Linear threshold for cold samples probably doesn't make a lot sense.
> Ah, I see. I missed line 170. But the way you have it setup now creates a cliff for the heuristic:
> 
> - for Candidate.CallsiteCount == ColdCountThreshold, threshold is SampleColdCallSiteThreshold
> - for Candidate.CallsiteCount == ColdCountThreshold + 1, threshold is SampleColdCallSiteThreshold + PreInlineHotCallsiteThresholdConstant (can be large number from flag)
> 
> We don't need to have linear threshold for cold samples, but it makes more sense for the heuristic to be consecutive without cliffs. 
> 
> If all we need here is to avoid changing SampleColdCallSiteThreshold to1, perhaps making PreInlineHotCallsiteThresholdConstant a constant 1 is good enough -- it doesn't need to be tunable. Using 1 also makes the heuristic consecutive. 
> 
> I was hoping to avoid explosion of tuning knobs and only expose those that are really necessary. 
Making PreInlineHotCallsiteThresholdConstant sounds good. Actually I never tried other values during my tuning.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125023/new/

https://reviews.llvm.org/D125023