[PATCH] D107897: [FuncSpec] Don't specialize function which are easy to inline

Thu Aug 12 23:29:33 PDT 2021

ChuanqiXu added a comment.

In D107897#2943003 <https://reviews.llvm.org/D107897#2943003>, @snehasish wrote:

>> So here is the decision point:
>>
>> - Put function specialization in the front of the inliner.
>>   - We have the chance to convert indirect call to direct call.
>>   - There may be unuseful specialization which would be covered by the inliner. And this is what this patch tries to mitigate.
>> - Put function specialization in the back of the inliner.
>>   - We would miss the opportunity to convert indirect call to direct call.
>>   - We could avoid unuseful specialization totally.
>
> Thanks for elaborating on the motivation. At a high level it makes sense to me for plain builds which do not use PGO. With PGO, the situation is a little different - there is no longer any need to convert indirect call to direct call for subsequent inlining using this pass since indirect call promotion is very effective with profile information. However, even for plain builds we should not underestimate the negative impact of useless specializations particularly for large binaries. I think we may go towards a different pass timing for plain vs pgo optimization pipelines.

Yes, it's a key and good question: with PGO, how many benefit could the function specialization get? In fact, the main profit of function specialization could be covered by PGO according to the discussion from D93838 <https://reviews.llvm.org/D93838>. Although it is easy to image cases that function specialization could still benefit after applying PGO, it is hard to give realistic example and numbers for me now. But at least I think we could get in consensus in following:

- Function Specialization could be valuable without PGO.
- With PGO, the effect of function specialization may be questionable. But we need to experiment it more.

I think the topic of PGO and function specialization is much beyond the scope of this patch.  It is a good question and it may be better to discuss it in llvm-dev.

>> From my perspective, it looks **not bad** after we add heuristics (no-matter simple constant or inline cost) to avoid specializing functions which would be inlined finally. Since we get the optimization opportunity and avoid many unused specialization.
>>
>> For this topic about pass order, I have a more crazy idea:
>> inliner --> function specializer --> inliner (may be a new inliner pass)
>> To avoid wasting the time, we could add special attribute in function specializer. And let the second inliner to decide whether or not to inline these functions. The second inliner is also responsible to remove the attributes.
>
> This seems related to the approach you proposed in https://reviews.llvm.org/D105524 (incl. generalizing across to ThinLTO)?

Not actually. What I mean is to make function specialization run after the inliner and we could write another inline pass to consider inlining the specialized functions. I think it could solve the useless-specialize problem totally. The reason why I didn't implement it is that I think it may be much complex than this one.

>> But the question why I didn't implement this is : is it really much better than this simple patch? On the one hand, we need write much more codes and it means more compile time generally. On the other hand, would the generated code be so different?
>> The reason I prefer this patch is the simplicity and effectiveness in a degree.
>
> Can you share what was the compile time degradation without this patch on SPEC?
> Also we would be interested in any data you may have on code size increases and performance improvements on SPEC.

Let me sure if I understand your words correctly. It looks like that you are asking: with function specialization, how the compile-time, code-sizes and performance changes?
There are results in `D93838`. Simply, the compile-time and code-size change is not aggressive. And the performance change mainly good for 500.mcf_r by 10%.

>>> To me it seems that either the cost modeling should leverage the InlineCost heuristics (rather than arbitrary constants tuned on SPEC)
>>
>> The InlineCost heuristics is for callsites instead of functions. If we decide to use InlineCost as a filter, we need run InlineCost for every callsite with constant parameter. It looks expensive to me since the calculation of InlineCost is not simple actually. And the estimation for functions is much cheaper.
>
> Thats true, however again with PGO information not all functions need to be considered. For large binaries with a highly skewed ratio of hot to cold functions, it's possible a callsite oriented approach is less expensive while generalizing better? I am not fundamentally opposed to this patch for now as it benefits plain builds. However, in the longer run with a goal to enable this by default we are interested in how this will generalize to PGO builds as well as leveraging ThinLTO.

Yeah, it should use profiling information for the decision of this patch. I think it must be the future direction. For example, we would give up to specialize a function which is too hot.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107897/new/

https://reviews.llvm.org/D107897