[PATCH] D60086: [SampleProfile] Check entry count instead of total count to decide if inlined callsite is hot.

Tue Apr 2 12:04:45 PDT 2019

twoh added a comment.

@wmi Thank you for the detailed explanation!

> Want to emphasize that the main purpose of inlining inside of AutoFDO is not to reduce call overhead or expose other optimization opportunities, but to enable context sensitive profile annotation. Suppose if there is an inline instance with cold entry but a hot spot inside of it, although we will not reduce call overhead by inlining it, inlining it is still important so the hot spot can get context sensitive profile during profile annotation.

I can understand the value of context-sensitive profile information, but wouldn't it be only valuable if the callsite is actually worth inlining? IIUC, LLVM doesn't propagate the context-sensitive profile information collected from the inlined callsite back to the original standalone function. For me it seems that the current implementation drives the wrong inlining decision to collect the information useful only under the wrong decision.

> As you said, one case we want to prevent is the inline instance is long and doesn't have any hot spot inside of it. The only reason the total sample count looks hot is because the body is large enough. For such case, we may be able to prevent it by adding some simple heuristic in callsiteIsHot. Like in addition to the total sample count being larger than hot threshold, the max sample count inside of the inline instance must also be larger than a certain threshold, i.e., there is at least one hot spot inside for the inline instance to be hot.

I think this will mitigate the problem, but still don't think it is the right way to solve the problem. My biggest concern with the current implementation is that we're comparing function total count against PSI threshold values. If we really cannot trust function entry count for sample profile data we may try function total count based heuristic, but still comparing total count against PSI threshold, which is computed from the instruction count histogram, seems wrong to me.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60086/new/

https://reviews.llvm.org/D60086