[PATCH] D96197: [CSSPGO] Add switches to control prelink/postlink inline separately

Wed Feb 10 16:33:00 PST 2021

wenlei added a comment.

In D96197#2554808 <https://reviews.llvm.org/D96197#2554808>, @wmi wrote:

>> Yeah, the amount of inlining affects the importing due to call graph depth difference. The part you mentioned about turning off hot call site CGSCC inlining in prelto is the place where HotCallSiteThreshold is set to zero in buildInlinerPipeline, right?
>
> Right.
>
>> We would still have some inlining for small functions as they may end up with negative cost.
>
> If EnableRegularCGSCCInline and EnableSampleProfileInline are false in prelto, how would you have inlining for small functions?

In that case, small function inlining will also move to LTO, though it would require tweaking importing instr limit/threshold.

>> Using threshold is more flexible and can achieve the same thing, though we'd need to pass four zero or negative thresholds (hot|regular x fdo|cgscc), so I thought switch would be a bit easier. It's somewhat similar to -fno-inline - we could theoretically achieve the same thing by tweaking thresholds too.
>
> Passing multiple flags to set params may be ok for tuning but not for the default usage. I think we can hardcode the threshold value for CSSPGO after tuning is done.

Makes sense, we can use threshold for now and change the defaults for CSSPGO after it settles. I will skip this patch for now then.

>> we are experimenting with shifting more inlining from LTO prelink to postlink, from cgscc inlining to sample loader inlining.
>
> Talking about the shifting from cgscc inlining to sample loader inlining. One thing missing in sample loader inlining is it will be lack of iterative cleaning during inlining which cgscc inlining provides. Do you think whether it matters?

Yeah, this could potentially be a challenge. Without the iterative cleanup, the cost inliner sees may not be accurate. We hope that this could be mitigated by 1) tweaking the threshold for sample loader inliner, 2) potentially use post-codegen size from previous build to help estimating the cost, .e.g we could put function size alongside with cfg checksum in profile metadata, and reference that size to see through potential cleanup during sample loader. We always have cgscc passes in LTO, so the actually clean up should still do a good job there.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96197/new/

https://reviews.llvm.org/D96197