[PATCH] Indirect call target profiling related profile reader/writer changes

Wed Apr 15 14:46:54 PDT 2015

On Wed, Apr 15, 2015 at 10:13 AM, Bob Wilson <bob.wilson at apple.com> wrote:

>
> > On Apr 14, 2015, at 11:56 AM, betulb at codeaurora.org wrote:
> >
> >>
> >> On 04/10/2015 09:25 AM, betulb at codeaurora.org wrote:
> >>>> On 04/09/2015 11:06 AM, Betul Buyukkurt wrote:
> >>>>> In http://reviews.llvm.org/D8908#153838, @reames wrote:
> >>>>>
> >>>>>> Have the IR level construct patches made it up for review?  If so,
> >>>>>> can
> >>>>> So far I've posted two patches. These two patches should apply
> cleanly
> >>>>> to the tip, working with the present profile infrastructure. The next
> >>>>> set of patches will be the enabler ones: i.e. three more patches one
> >>>>> for
> >>>>> each of clang, llvm and compiler-rt. Clang patch will be up for
> review
> >>>>> later today.
> >>>>>
> >>>>>> you send me a link?  I managed to miss them.
> >>>>> So far there is this patch and the instrinsic instruction
> definitions:
> >>>>> http://reviews.llvm.org/D8877. All patches are necessary for getting
> >>>>> the
> >>>>> IC targets and having them displayed by the llvm-profdata.
> >>>> Ok, I'm really not convinced that the instrumentation code needs to be
> >>>> or should be an intrinsic.  This seems like something which should be
> >>>> emitted by the frontend and optimized like any other code.  To say
> this
> >>>> a different way, my instrumentation is going to be entirely different
> >>>> than your instrumentation.
> >>>>
> >>>> Having said that, I really don't care about this part of the proposed
> >>>> changes since they aren't going to impact me at all.  I'm am
> >>>> specifically not objecting to the changes, just commenting.  :)
> >>>>>> I'm assuming this will be some type of per call site metadata?
> >>>>> We do assign metadata at the indirect call sites. Format looks like
> as
> >>>>> follows:
> >>>>>
> >>>>> !33 = metadata !{metadata !"indirect_call_targets", i64
> >>>>> <total_exec_count>, metadata !"target_fn1â€ , i64 <target_fn1_count>,
> >>>>> metadata !"target_fn2â€ , i64 <target_fn2_count>, â€¦.}
> >>>>>
> >>>>> Currently, we're recording only the top most called five function
> >>>>> names
> >>>>> at each indirect call site. Following the string literal
> >>>>> â€œindirect_call_targetsâ€  are the fields  <total_exec_count> i.e. a
> >>>>> 64
> >>>>> bit value for the total number of times the indirect call is executed
> >>>>> followed by the function names and execution counts of each target.
> >>>> This was the part I was trying to ask about.  I really want to see
> >>>> where
> >>>> you're going with this optimization wise.  My naive guess is that this
> >>>> is going to be slightly off for what you actually want.
> >>>>
> >>>> Assuming you're going for profile guided devirtualization (and thus
> >>>> inlining), being able to check the type of the receiver (as opposed to
> >>>> the result of the virtual lookup) might be advantageous.  (Or, to say
> >>>> it
> >>>> differently, that's what I'm used to seeing.  Your approach might be
> >>>> completely reasonable, it's just not what I'm used to seeing.)  Have
> >>>> you
> >>>> thought about the tradeoffs here?
> >>> Not sure if I understood the problem here,
> >> First, I am not trying to say there is a problem with your approach; I
> >> am only saying that it's not what I would have expected based on past
> >> experience.  You may be entirely correct in your approach, you just need
> >> to convince me of that.  :)
> >>> however, we're recording both
> >>> the target address and the addresses/names of the instrumented
> functions
> >>> during the execution of the instrumented binary. During profile reading
> >>> these addresses are used to match the target addresses to corresponding
> >>> functions.
> >> Ok, let's start from the basics.  For profile guided devirtualization,
> >> you're constructing a cache from (something) to function pointer and
> >> using that cache lookup to enable inlining of the hot target.  You have
> >> two standard choices on what to use as your cache key: the result of the
> >> virtual lookup and the inputs to the virtual lookup.
> >>
> >> Option 1 - Inputs to virtual lookup
> >> if ((receiver, vtable index) == what I predicted)
> >>   tartget_I_predicted(); // inline me!!
> >> else {
> >>   target = full virtual dispatch();
> >>   target();
> >> }
> >>
> >> Option 2 - result of virtual lookup
> >> target = full virtual dispatch();
> >> if ('target' == what I predicted)
> >>   tartget_I_predicted(); // inline me!!
> >> else {
> >>   target();
> >> }
> >>
> >> You seem to be proposing option 2.  I'm saying that I'm used to seeing
> >> option 1 used.  Both approaches have their appeal, I'm just asking you
> >> to explain *why* you've chosen the one you apparently have.
> >
> > Not all indirect calls occur from C++ like codes. We're profiling and
> > optimizing out indirect calls from C codes as well. We're seeing up to 8%
> > gains on individual benchmarks in spec. This was measured on our
> platform.
>
> We could also consider a hybrid that uses option 1 for vtable calls and
> option 2 for general function pointer calls. For cases where the code calls
> several virtual functions on the same object, profiling for option 1 could
> be more efficient if we only record the type of the object once. I have no
> idea if that is worthwhile but it’s another possibility.
>

option 1 certainly has lower instrumentation overhead as well as more
efficient value profiling transformations but I consider it a separate
feature from function target profiling which can be more widely used.
Options can be introduced to pick the flavor for C++ programs.

David

>
> SPEC results are interesting, but I’d be much more interested to hear how
> it works for clang. If you build clang with this profiling enabled, what is
> the time and memory overhead? How much bigger are the profile data files?
> If you then rebuild clang with PGO, how much does it speed things up?
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150415/3682cc58/attachment.html>