[PATCH] D93838: [SCCP] Add Function Specialization pass

Sun Jun 6 14:37:17 PDT 2021

davidxl added a comment.

In D93838#2801375 <https://reviews.llvm.org/D93838#2801375>, @wenlei wrote:

> We also saw ipa-cp-clone being a very noticeable difference between gcc and llvm when we tried to move workloads from gcc to llvm. Thanks for working on this for llvm optimization parity with gcc.
>
> I think specialization can be beneficial when we can't do very selective inlining - it can realize part of the benefit of inlining with a more moderate size increase. I suspect the benefit of ICP from specialization will be much lower when PGO is used though, because PGO's speculative ICP is quite effective. For our use cases, it's more interesting to see how much benefit we could get from non-function-pointer constants, on top of PGO.

Agree.

> When this is ready, I'd be happy to give it a try on our larger workload to see if there's any interesting hit and how it compares against gcc's ipa-cp-clone for the same workload, all with PGO.
>
>>> Also with PGO, do we see similar improvement?
>>
>> Good question, we need to experiment to answer it.
>
> @ChuanqiXu FWIW, PGO always does speculative ICP from spec_qsort to cost_compare and arc_compare in spec2017/mcf.
>
>> I agree function specialization has its place when size is the concern (with -Os), or instruction working set is too large (to hurt performance). We don't have a mechanism to analyze the latter yet.
>
> @davidxl you're right that we can't model the latter, but the same working set concern is true for inlining too, and yet we still need to control size bloat from inlining (through inline cost etc) even if we can't model it accurately. I think specialization can be another layer that allows compiler to fine tune that balance when needed. I'm curious have you tried to turn off ipa-cp-clone for gcc for your workloads and whether that leads to noticeable perf difference?

My recollection is that it did not result in much perf difference with FDO with our internal workload -- otherwise it would have been further tuned.

>> In theory, function specialization based on interprocedural const/range/assertion propagation does not depend on inlining, so it should/can be done before the inliner.
>
> Agreed. Though from compile time perspective, I'm wondering if we have specialization before inlining, would we be paying the cost for specialization for functions that will eventually be inlined in which case the extra cost may yield nothing in perf. But if we do specialization after inlining, we'll be targeting those that deemed not worth inlining only.

Makes sense -- this is a the reason for iterative optimizations. Partial inlining is in the same situation too.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93838/new/

https://reviews.llvm.org/D93838