[PATCH] Indirect call target profiling related profile reader/writer changes

Xinliang David Li xinliangli at gmail.com
Wed Apr 15 14:28:21 PDT 2015


On Mon, Apr 13, 2015 at 9:38 AM, Philip Reames <listmail at philipreames.com>
wrote:

>
> On 04/10/2015 09:25 AM, betulb at codeaurora.org wrote:
>
>> On 04/09/2015 11:06 AM, Betul Buyukkurt wrote:
>>>
>>>> In http://reviews.llvm.org/D8908#153838, @reames wrote:
>>>>
>>>>  Have the IR level construct patches made it up for review?  If so, can
>>>>>
>>>> So far I've posted two patches. These two patches should apply cleanly
>>>> to the tip, working with the present profile infrastructure. The next
>>>> set of patches will be the enabler ones: i.e. three more patches one for
>>>> each of clang, llvm and compiler-rt. Clang patch will be up for review
>>>> later today.
>>>>
>>>>  you send me a link?  I managed to miss them.
>>>>>
>>>> So far there is this patch and the instrinsic instruction definitions:
>>>> http://reviews.llvm.org/D8877. All patches are necessary for getting
>>>> the
>>>> IC targets and having them displayed by the llvm-profdata.
>>>>
>>> Ok, I'm really not convinced that the instrumentation code needs to be
>>> or should be an intrinsic.  This seems like something which should be
>>> emitted by the frontend and optimized like any other code.  To say this
>>> a different way, my instrumentation is going to be entirely different
>>> than your instrumentation.
>>>
>>> Having said that, I really don't care about this part of the proposed
>>> changes since they aren't going to impact me at all.  I'm am
>>> specifically not objecting to the changes, just commenting.  :)
>>>
>>>> I'm assuming this will be some type of per call site metadata?
>>>>>
>>>> We do assign metadata at the indirect call sites. Format looks like as
>>>> follows:
>>>>
>>>> !33 = metadata !{metadata !"indirect_call_targets", i64
>>>> <total_exec_count>, metadata !"target_fn1†, i64 <target_fn1_count>,
>>>> metadata !"target_fn2†, i64 <target_fn2_count>, ….}
>>>>
>>>> Currently, we're recording only the top most called five function names
>>>> at each indirect call site. Following the string literal
>>>> “indirect_call_targets† are the fields  <total_exec_count> i.e. a 64
>>>> bit value for the total number of times the indirect call is executed
>>>> followed by the function names and execution counts of each target.
>>>>
>>> This was the part I was trying to ask about.  I really want to see where
>>> you're going with this optimization wise.  My naive guess is that this
>>> is going to be slightly off for what you actually want.
>>>
>>> Assuming you're going for profile guided devirtualization (and thus
>>> inlining), being able to check the type of the receiver (as opposed to
>>> the result of the virtual lookup) might be advantageous.  (Or, to say it
>>> differently, that's what I'm used to seeing.  Your approach might be
>>> completely reasonable, it's just not what I'm used to seeing.)  Have you
>>> thought about the tradeoffs here?
>>>
>> Not sure if I understood the problem here,
>>
> First, I am not trying to say there is a problem with your approach; I am
> only saying that it's not what I would have expected based on past
> experience.  You may be entirely correct in your approach, you just need to
> convince me of that.  :)
>
>> however, we're recording both
>> the target address and the addresses/names of the instrumented functions
>> during the execution of the instrumented binary. During profile reading
>> these addresses are used to match the target addresses to corresponding
>> functions.
>>
> Ok, let's start from the basics.  For profile guided devirtualization,
> you're constructing a cache from (something) to function pointer and using
> that cache lookup to enable inlining of the hot target.  You have two
> standard choices on what to use as your cache key: the result of the
> virtual lookup and the inputs to the virtual lookup.
>
> Option 1 - Inputs to virtual lookup
> if ((receiver, vtable index) == what I predicted)
>   tartget_I_predicted(); // inline me!!
> else {
>   target = full virtual dispatch();
>   target();
> }
>
> Option 2 - result of virtual lookup
> target = full virtual dispatch();
> if ('target' == what I predicted)
>   tartget_I_predicted(); // inline me!!
> else {
>   target();
> }
>
> You seem to be proposing option 2.  I'm saying that I'm used to seeing
> option 1 used.  Both approaches have their appeal, I'm just asking you to
> explain *why* you've chosen the one you apparently have.
>
>  During optimization, target function name should be adequate to
>> retrieve the function type information which then can be used to compare
>> the types of the arguments as well as the return type w/ the call site to
>> ensure the match.
>>
> Er, I'm now really confused.  With a properly constructed check, there
> should be no runtime checking of types required.  Can you explain why you
> think this is needed?


This is compile time function signature matching.  In rare cases, I have
seen bad target functions get leaked into the value set (with another
compiler) which requires compile time pruning -- but it is mostly a side
effect of the implementation of the profiler (e.g., due to race condition
etc).

David


>
>
>> To clarify, our interest at this time is to upstream all the
>> infrastructure related changes. The optimizations using the profile data
>> will not be part of our initial set of patches. On the other hand,
>> agreeing on the metadata format at the call sites would be advantageous in
>> terms of supporting multiple profile methods.
>>
> See my other response to this thread.
>
>>
>> Thanks,
>> -Betul
>>
>>
>>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150415/6aa25265/attachment.html>


More information about the llvm-commits mailing list