[PATCH] D60086: [SampleProfile] Check entry count instead of total count to decide if inlined callsite is hot.

Tue Apr 2 10:20:57 PDT 2019

wmi added a comment.

In D60086#1450766 <https://reviews.llvm.org/D60086#1450766>, @twoh wrote:

> @wmi Thanks for the reply! I can totally understand that entry count is not as precise as total count, but still don't think current implementation is the right way to address the issue. As I mentioned in the summary it compares two different things (instruction level counter vs function level counter), opens up a possibility for optimizing against wrong function (e.g. long and cold function), and makes it hard to find the root cause of the performance issue.
>
> If we can't have a precise entry count, the right way to address the issue would be not using PSI based heuristic but using a heuristic that actually considers a total count of the function.

Thanks Taewook. I understand your concern and I want to add some extra explanation to my last reply. In addition to the impreciseness of entry count of inline instance, another reason we choose total sample count instead of entry count is it is more conservative way to prevent missing inlining in sample profile load phase. Total sample count is always equal to or larger than the entry count, which means it is less likely we miss a function which better be inlined but not. Want to emphasize that the main purpose of inlining inside of AutoFDO is not to reduce call overhead or expose other optimization opportunities, but to enable context sensitive profile annotation. Suppose if there is an inline instance with cold entry but a hot spot inside of it, although we will not reduce call overhead by inlining it, inlining it is still important so the hot spot can get context sensitive profile during profile annotation.

As you said, one case we want to prevent is the inline instance is long and doesn't have any hot spot inside of it. The only reason the total sample count looks hot is because the body is large enough. For such case, we may be able to prevent it by adding some simple heuristic in callsiteIsHot. Like in addition to the total sample count being larger than hot threshold, the max sample count inside of the inline instance must also be larger than a certain threshold, i.e., there is at least one hot spot inside for the inline instance to be hot.

What do you think?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60086/new/

https://reviews.llvm.org/D60086