[llvm-dev] [RFC] New Feature Proposal: De-Optimizing Cold Functions using PGO Info

Wed Sep 9 21:50:13 PDT 2020

1%+ overhead is indeed interesting. If you use lld as linker (together with new pass manager), you should be able to have a good profile guided function level layout so dead functions are moved out of the hot pages.

This may also be related to subtle pass ordering issue. Pre-inline counts may not be super accurate, but we can’t use post-inline counts either given CGSCC inline is half way through the opt pipeline. Looking at the patch, it seems the decision is made at PGO annotation time which is between pre-instrumentation inline and CGSCC inline.

From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of Modi Mo via llvm-dev <llvm-dev at lists.llvm.org>
Reply-To: Modi Mo <modimo at fb.com>
Date: Wednesday, September 9, 2020 at 6:18 PM
To: Min-Yih Hsu <minyihh at uci.edu>, llvm-dev <llvm-dev at lists.llvm.org>, "cfe-dev (cfe-dev at lists.llvm.org)" <cfe-dev at lists.llvm.org>, Hongtao Yu <hoy at fb.com>
Subject: Re: [llvm-dev] [RFC] New Feature Proposal: De-Optimizing Cold Functions using PGO Info

The 1.29% is pretty considerable on functions that should never be hit according to profile information. This can indicate that there might be something amiss with the profile quality and that certain hot functions are not getting caught. Alternatively, given the ~5% code size increase you mention in the other thread the cold code may not be being moved out to a cold page so i-cache pollution ends up being a factor. I think it would be worthwhile to dig deeper into why there’s any performance degradation on functions that should never be called.

Also if you’re curious on how to build clang itself with PGO the documentation is here: https://llvm.org/docs/HowToBuildWithPGO.html

On 9/8/20, 5:21 PM, "llvm-dev on behalf of Min-Yih Hsu via llvm-dev" <llvm-dev-bounces at lists.llvm.org<mailto:llvm-dev-bounces at lists.llvm.org> on behalf of llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:

We also did evaluations on FullLTO, here are the numbers:

Experiment Name               Link Time Speedup         Target Overhead
DeOpt Cold Zero Count                10.87%                           1.29%
DeOpt Cold 25%                           18.76%                           1.50%
DeOpt Cold 50%                           30.16%                           3.94%
DeOpt Cold 75%                           38.71%                           8.97%

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200910/2146944f/attachment.html>