[PATCH] D125023: [CSSPGO][Preinliner] Use linear threshold to drive inline decision.

Thu May 5 23:31:42 PDT 2022

hoy added inline comments.

================
Comment at: llvm/tools/llvm-profgen/CSPreInliner.cpp:173
+    // Linearly adjust threshold based on call site hotness
+    auto Range = Summary->getMaxCount() - ColdCountThreshold;
+    double Position =
----------------
wenlei wrote:
> hoy wrote:
> > wenlei wrote:
> > > One concern about using `getMaxCount` -- that will have more instability and variation run to run. How about using HotCountThreshold (or N * HotCountThresold, or a set percentile) to stabilize?
> > > 
> > > The `NormalizedHotness` doesn't have to be between 0 and 1. Or if we want it to be between 0 and 1, we can also do `min(NormalizedHotness, 1)`.
> > > (or N * HotCountThresold, or a set percentile) to stabilize?
> > 
> > You mean something like Position = N* HotCountThresold ? BTW, what is N? 
> > 
> > Using a percentile should have a better stability. Position is similar to (1-percentile).  
> > 
> > 
> > BTW, what is N?
> 
> N can be arbitrary constant, depending on the range we want to normalize. Some random example: (3 * 99% count), or (2 * 95% count)
> 
> > You mean something like Position = N* HotCountThresold ? 
> 
> No, just replace getMaxCount with HotCountThresold or a precentile count.
> 
> > Using a percentile should have a better stability. Position is similar to (1-percentile).
> 
> If you use (3 * 99% count) as the upper bound of range, then for count = (5 * 99% count), normalized hotness will be larger than 1, unless we manually cap it. 
I see. Yeah, choosing a value smaller than getMaxCount may need caping to avoid threshold going ridiculous. But capping means values above the cap won't be treated linearly.

Also positioning based on HotCountThresold makes it service-dependent again. The value at a given cutoff could be very different.  

================
Comment at: llvm/tools/llvm-profgen/CSPreInliner.cpp:176-179
+    double Multiplier = PreInlineHotCallsiteThresholdMultiplier * Position;
+    SampleThreshold = SampleHotCallSiteThreshold * Multiplier +
+                      SampleColdCallSiteThreshold +
+                      PreInlineHotCallsiteThresholdConstant;
----------------
wenlei wrote:
> hoy wrote:
> > wenlei wrote:
> > > How about we simplify the knobs, i.e. remove flag `preinline-hot-callsite-threshold-multiplier` and `preinline-hot-callsite-threshold-constant`. 
> > > 
> > > Make `Multiplier` a constant, and we can still effectively tune the multiplier via setting `sample-profile-hot-inline-threshold`.
> > > Make `PreInlineHotCallsiteThresholdConstant` a constant and we can still effectively tune it via `sample-profile-cold-inline-threshold`. We could also consider setting `SampleColdCallSiteThreshold = 1` in PreInliner (current we set it to 0).
> > > 
> > > With the above, I think if we can simply switches (avoid adding new ones) without scarifying flexibility for tuning. 
> > > Make Multiplier a constant, and we can still effectively tune the multiplier via setting sample-profile-hot-inline-threshold.
> > 
> > This should work since SampleHotCallSiteThreshold is only used here. One thing is the hot threshold may look very different from what the compiler uses, because of the multiplier. We could use the same heuristics on the compiler as well.
> > 
> > > Make PreInlineHotCallsiteThresholdConstant a constant and we can still effectively tune it via sample-profile-cold-inline-threshold
> > 
> > This may also affect cold contexts. Enlarging `sample-profile-cold-inline-threshold` , say from 0 to 5, may result in a lot more cold contexts in the profile.  This is because we currently treat cold functions as having the same hotness, and only use the cold threshold for them instead of computing a linear threshold. 
> > One thing is the hot threshold may look very different from what the compiler uses, because of the multiplier. We could use the same heuristics on the compiler as well.
> 
> I think it's ok for compiler and preinliner to have different thresholds, because one is based on IR, and the other is based on different proxy. 
> 
> > This may also affect cold contexts. Enlarging sample-profile-cold-inline-threshold , say from 0 to 5, may result in a lot more cold contexts in the profile. This is because we currently treat cold functions as having the same hotness, and only use the cold threshold for them instead of computing a linear threshold.
> 
> Maybe you didn't understand what I said. What I was suggesting is basically this `+ SampleColdCallSiteThreshold + PreInlineHotCallsiteThresholdConstant` -> `+ SampleColdCallSiteThreshold`. Then we tune `SampleColdCallSiteThreshold` alone, which is equivalent. I didn't find other use of `SampleColdCallSiteThreshold` in llvm-profgen. 
`SampleColdCallSiteThreshold` is also used around line 170:

    if (Candidate.CallsiteCount <= ColdCountThreshold)
       SampleThreshold = SampleColdCallSiteThreshold;

I didn't choose to go linearly for every sample. Cold samples share the same threshold for now. I did this because cold samples are many and size-sensitive, changing `SampleColdCallSiteThreshold` from 0 to 1 can cause profile grow a lot. Linear threshold for cold samples probably doesn't make a lot sense.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125023/new/

https://reviews.llvm.org/D125023