[PATCH] Indirect call target profiling related profile reader/writer changes

Mon Apr 13 11:20:16 PDT 2015

Indirect call promotion + inlining does have very large performance
benefit. IIRC, it improves eon, h264ref, povray, gap, ......etc. Many
requires cross module inlining too. For instance povray improves >20% (if
my memory does not betray me).  Our internal programs all show large
improvement with it.

David

On Mon, Apr 13, 2015 at 11:02 AM, Philip Reames <listmail at philipreames.com>
wrote:

>
> On 04/13/2015 10:57 AM, betulb at codeaurora.org wrote:
>
>> On Apr 10, 2015, at 9:25 AM, betulb at codeaurora.org wrote:
>>>>
>>>>  On 04/09/2015 11:06 AM, Betul Buyukkurt wrote:
>>>>>
>>>>>> In http://reviews.llvm.org/D8908#153838, @reames wrote:
>>>>>>
>>>>>>  Have the IR level construct patches made it up for review?  If so,
>>>>>>> can
>>>>>>>
>>>>>> So far I've posted two patches. These two patches should apply cleanly
>>>>>> to the tip, working with the present profile infrastructure. The next
>>>>>> set of patches will be the enabler ones: i.e. three more patches one
>>>>>> for
>>>>>> each of clang, llvm and compiler-rt. Clang patch will be up for review
>>>>>> later today.
>>>>>>
>>>>>>  you send me a link?  I managed to miss them.
>>>>>>>
>>>>>> So far there is this patch and the instrinsic instruction definitions:
>>>>>> http://reviews.llvm.org/D8877. All patches are necessary for getting
>>>>>> the
>>>>>> IC targets and having them displayed by the llvm-profdata.
>>>>>>
>>>>> Ok, I'm really not convinced that the instrumentation code needs to be
>>>>> or should be an intrinsic.  This seems like something which should be
>>>>> emitted by the frontend and optimized like any other code.  To say this
>>>>> a different way, my instrumentation is going to be entirely different
>>>>> than your instrumentation.
>>>>>
>>>>> Having said that, I really don't care about this part of the proposed
>>>>> changes since they aren't going to impact me at all.  I'm am
>>>>> specifically not objecting to the changes, just commenting.  :)
>>>>>
>>>>>> I'm assuming this will be some type of per call site metadata?
>>>>>>>
>>>>>> We do assign metadata at the indirect call sites. Format looks like as
>>>>>> follows:
>>>>>>
>>>>>> !33 = metadata !{metadata !"indirect_call_targets", i64
>>>>>> <total_exec_count>, metadata !"target_fn1Ã¢Â€Â , i64
>>>>>> <target_fn1_count>,
>>>>>> metadata !"target_fn2Ã¢Â€Â , i64 <target_fn2_count>, Ã¢Â€Â¦.}
>>>>>>
>>>>>> Currently, we're recording only the top most called five function
>>>>>> names
>>>>>> at each indirect call site. Following the string literal
>>>>>> Ã¢Â€Âœindirect_call_targetsÃ¢Â€Â  are the fields  <total_exec_count>
>>>>>> i.e. a 64
>>>>>> bit value for the total number of times the indirect call is executed
>>>>>> followed by the function names and execution counts of each target.
>>>>>>
>>>>> This was the part I was trying to ask about.  I really want to see
>>>>> where
>>>>> you're going with this optimization wise.  My naive guess is that this
>>>>> is going to be slightly off for what you actually want.
>>>>>
>>>>> Assuming you're going for profile guided devirtualization (and thus
>>>>> inlining), being able to check the type of the receiver (as opposed to
>>>>> the result of the virtual lookup) might be advantageous.  (Or, to say
>>>>> it
>>>>> differently, that's what I'm used to seeing.  Your approach might be
>>>>> completely reasonable, it's just not what I'm used to seeing.)  Have
>>>>> you
>>>>> thought about the tradeoffs here?
>>>>>
>>>> Not sure if I understood the problem here, however, we're recording both
>>>> the target address and the addresses/names of the instrumented functions
>>>> during the execution of the instrumented binary. During profile reading
>>>> these addresses are used to match the target addresses to corresponding
>>>> functions. During optimization, target function name should be adequate
>>>> to
>>>> retrieve the function type information which then can be used to compare
>>>> the types of the arguments as well as the return type w/ the call site
>>>> to
>>>> ensure the match.
>>>>
>>>> To clarify, our interest at this time is to upstream all the
>>>> infrastructure related changes. The optimizations using the profile data
>>>> will not be part of our initial set of patches. On the other hand,
>>>> agreeing on the metadata format at the call sites would be advantageous
>>>> in
>>>> terms of supporting multiple profile methods.
>>>>
>>>> Thanks,
>>>> -Betul
>>>>
>>> I have the same questions and concerns as Philip. If the goal of this is
>>> to support devirtualization, this is not how I would expect it to be
>>> done.
>>> If you really want something more general, can you provide data to show
>>> that it is worth the cost? Recording all that information when profiling
>>> will not be cheap, and it adds significant complexity as well.
>>>
>> There was an earlier email of mine that went to the llvm-dev mailing list
>> discussing about the implementation dependencies and gains/costs.
>> http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-April/084271.html
>>
> On that thread, you claim an 8% performance boost on spec.  What
> optimizations did you implement to get that?  Can you outline the structure?
>
>
>>  I also donâ€™t think we should consider taking all of your infrastructure
>>> changes without first seeing at least preliminary versions of the
>>> optimization patches that demonstrate the benefits.
>>>
>>>  Thanks.
>>>>>> -Betul
>>>>>>
>>>>>>
>>>>>> http://reviews.llvm.org/D8908 <http://reviews.llvm.org/D8908>
>>>>>>
>>>>>> EMAIL PREFERENCES
>>>>>>    http://reviews.llvm.org/settings/panel/emailpreferences/
>>>>>> <http://reviews.llvm.org/settings/panel/emailpreferences/>
>>>>>>
>>>>>
>>
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150413/71207a70/attachment.html>