[PATCH] D107897: [FuncSpec] Don't specialize function which are easy to inline

Thu Aug 12 21:25:28 PDT 2021

snehasish added a comment.

> So here is the decision point:
>
> - Put function specialization in the front of the inliner.
>   - We have the chance to convert indirect call to direct call.
>   - There may be unuseful specialization which would be covered by the inliner. And this is what this patch tries to mitigate.
> - Put function specialization in the back of the inliner.
>   - We would miss the opportunity to convert indirect call to direct call.
>   - We could avoid unuseful specialization totally.

Thanks for elaborating on the motivation. At a high level it makes sense to me for plain builds which do not use PGO. With PGO, the situation is a little different - there is no longer any need to convert indirect call to direct call for subsequent inlining using this pass since indirect call promotion is very effective with profile information. However, even for plain builds we should not underestimate the negative impact of useless specializations particularly for large binaries. I think we may go towards a different pass timing for plain vs pgo optimization pipelines.

> From my perspective, it looks **not bad** after we add heuristics (no-matter simple constant or inline cost) to avoid specializing functions which would be inlined finally. Since we get the optimization opportunity and avoid many unused specialization.
>
> For this topic about pass order, I have a more crazy idea:
> inliner --> function specializer --> inliner (may be a new inliner pass)
> To avoid wasting the time, we could add special attribute in function specializer. And let the second inliner to decide whether or not to inline these functions. The second inliner is also responsible to remove the attributes.

This seems related to the approach you proposed in https://reviews.llvm.org/D105524 (incl. generalizing across to ThinLTO)?

> But the question why I didn't implement this is : is it really much better than this simple patch? On the one hand, we need write much more codes and it means more compile time generally. On the other hand, would the generated code be so different?
> The reason I prefer this patch is the simplicity and effectiveness in a degree.

Can you share what was the compile time degradation without this patch on SPEC?
Also we would be interested in any data you may have on code size increases and performance improvements on SPEC.

>> To me it seems that either the cost modeling should leverage the InlineCost heuristics (rather than arbitrary constants tuned on SPEC)
>
> The InlineCost heuristics is for callsites instead of functions. If we decide to use InlineCost as a filter, we need run InlineCost for every callsite with constant parameter. It looks expensive to me since the calculation of InlineCost is not simple actually. And the estimation for functions is much cheaper.

Thats true, however again with PGO information not all functions need to be considered. For large binaries with a highly skewed ratio of hot to cold functions, it's possible a callsite oriented approach is less expensive while generalizing better? I am not fundamentally opposed to this patch for now as it benefits plain builds. However, in the longer run with a goal to enable this by default we are interested in how this will generalize to PGO builds as well as leveraging ThinLTO.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107897/new/

https://reviews.llvm.org/D107897