[PATCH] D134376: [ModuleInliner] Add a cost-benefit-based priority

Tue Sep 27 12:52:18 PDT 2022

kazu added a comment.

In D134376#3818545 <https://reviews.llvm.org/D134376#3818545>, @wenlei wrote:

>> a large internel benchmark yields performance comparable to the bottom-up inliner -- both in terms of the execution performance and .text* sizes.
>
> This looks promising. The comparison is between `enable-module-inliner` on vs off, right?

Thanks.  Yes, `-mllvm -enable-module-inliner` is the only difference.  Both the baseline and the experiment use FDO, ThinLTO, and `-fsplit-machine-functions`.

One thing I might point out here is that I am not doing any cleanup beyond whatever basic cleanups `InlineFunction` performs.  With the bottom-up inliner, we diligently clean up after each SCC (see `PassBuilder::addPGOInstrPasses`), but that doesn't seem to matter with the module inliner.  My hypothesis is that once we inline those call sites that reduce the caller size, followed by ones with high benefit-to-cost ratios, then we've captured the vast majority of the benefit from inlining.  That is, we don't really need the exact "tightest" instruction count after DCE, CSE, etc.  Even if the module inliner didn't give us additional performance, it could still simplify our life -- no CGSCC maintenance or successive cleanups.

> Do you plan to tune and open up module inliner for sample PGO and non-PGO cases where cost-benefit analysis isn't available yet?

Yes, I'd like to do something for the sample PGO case.  I'd like to enable the cost-benefit analysis for the sample PGO, but that's been my hardest project.  IIUC, the sample profile loader keeps inlining functions top down until the weight of the inlining subtree goes below the threshold.  As a result, by the time we get to the profile-driven inliner (`Inliner.cpp`), we don't have a lot of interesting decisions left to make.  For the sample PGO case, I think I have to depart from the top-down inlining and start inlining callees from where it matters according to some combinations of context sensitivity, profile counts, and the usual metrics from trial inlining (`InlineCost.cpp`), etc.

The non-PGO case isn't our primary interest.  That said, if I come up with reasonable heuristics, I might contribute that.  There may be a long-term benefit in steering the community toward the module inliner as the sole inliner as opposed to directing them to the two different inliners (`Inliner.cpp` and `ModuleInliner.cpp`) depending on whether they are using instrumentation FDO or not.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134376/new/

https://reviews.llvm.org/D134376