[PATCH] D30062: Estimate speedup due to inlining and use that to adjust threshold.

Thu Apr 20 17:10:35 PDT 2017

chandlerc added a comment.

In https://reviews.llvm.org/D30062#728588, @eraman wrote:

> Sorry for the delay in collecting performance numbers. I now have some data to share.  First, some details on the methodology. I used ~400 microbenchmarks used internally at Google. I built them with the following percentage values of min-speedup-for-bonus: 0%, 5%, 10%, and 15%. I ran each benchmark 10 times in each configuration. Speedup/slowdown for a benchmark is calculated only when the p-value <=0.05 (and thus the results might include different subset of benchmarks for different configs). The numbers presented below are the geomean across all benchmarks.
>
> Config | #Benchmarks | Geomean | #Slowdowns | #Speedups | Size increase percentage
>  0%   |   134 | 2.92% |  51 | 83 | 2.45%  
>  5%   |   121 | 1.05% |  41 | 80 | 1.58%
>  10% |   115 | 0.8%   |  51 | 64 | 1.32%
>  15% |   160 | 1.03% | 44 | 116| 1.02%
>
> Some observations:
>
> - The best geomean performance comes when the min-speedup-for-bonus is set to 0%. I interpret this to mean that it is generally a performance win to increase the threshold for hot callsites, and the speedup estimation is a way to control the size growth.
> - The performance when the min-speedup-for-bonus is set to 10% sits in-between that of 5% and 15%.  As I mentioned above, these are not apples-to-apples comparisons beacause we compute geomean on a different set of benchmarks. Even for the same benchmark, it is possible (and it does happen) that the performance numbers are not monotonically decreasing as the min-speedup-for-bonus is increased.
> - For comparison, I calculated the size growth if we simply apply a 3X multiplier to the threshold irrespective of the callsite frequency. The size increase is 9.7%.

The data here is really interesting, but I'm not sure about using the 0% threshold...

What I mean by that, is that if we use a 0% min-speedup-for-bonus, then we essentially aren't using the speedup computation at all are we? It seems like this would be roughly the same as just applying a similar multiplier to the threshold based on call site hotness. Maybe I just don't understand what the result of this is (sorry if I'm just failing to page back in all of the details)? If my understanding is correct though, then I would focus on that first and get it in, and then return to the speedup heuristic to see if there are wins to be found by doing a speedup analysis to bonus less hot call sites, or doing it to give an *even higher* threshold when a call site is both hot *and* gives a speedup on inlining.

> I'm collecting SPEC numbers now. I've also fixed a bug in the code and will update the patch shortly.

Please also collect LLVM test suite numbers with the SPEC numbers.

One thing that would be particularly important though is to collect larger application *size* numbers. I don't think the size growth numbers from microbenchmarks are really going to tell us what we need to know to make good threshold decisions where size is a factor (especially https://reviews.llvm.org/owners/package/2/ vs. https://reviews.llvm.org/owners/package/3/).

https://reviews.llvm.org/D30062