[PATCH] Indirect call target profiling related profile reader/writer changes

Mon Apr 13 14:00:57 PDT 2015

On Mon, Apr 13, 2015 at 1:51 PM, Philip Reames <listmail at philipreames.com>
wrote:

>
> On 04/13/2015 11:20 AM, Xinliang David Li wrote:
>
> Indirect call promotion + inlining does have very large performance
> benefit. IIRC, it improves eon, h264ref, povray, gap, ......etc. Many
> requires cross module inlining too. For instance povray improves >20% (if
> my memory does not betray me).  Our internal programs all show large
> improvement with it.
>
> I am not questioning the benefit of profile guided devirtualization and
> inlining.  Getting these optimizations implemented is a clear win if done
> right.
>
> I am seeking clarification on the approach being taken to implement them
> and the design choices being proposed.  That's a different matter.
>

Sure. However in order to demostrate the full benefit of the value
profiling transformations, many others may need to be in place, for
instance making inlining aware of profile data, which is currently missing.

For value profiling, I think it is more important to have metrics on the
instrumentation side -- the low memory overhead, low runtime overhead will
be obvious ones.

David

>
>  David
>
>
> On Mon, Apr 13, 2015 at 11:02 AM, Philip Reames <listmail at philipreames.com
> > wrote:
>
>>
>> On 04/13/2015 10:57 AM, betulb at codeaurora.org wrote:
>>
>>>  On Apr 10, 2015, at 9:25 AM, betulb at codeaurora.org wrote:
>>>>>
>>>>>  On 04/09/2015 11:06 AM, Betul Buyukkurt wrote:
>>>>>>
>>>>>>> In http://reviews.llvm.org/D8908#153838, @reames wrote:
>>>>>>>
>>>>>>>  Have the IR level construct patches made it up for review?  If so,
>>>>>>>> can
>>>>>>>>
>>>>>>> So far I've posted two patches. These two patches should apply
>>>>>>> cleanly
>>>>>>> to the tip, working with the present profile infrastructure. The next
>>>>>>> set of patches will be the enabler ones: i.e. three more patches one
>>>>>>> for
>>>>>>> each of clang, llvm and compiler-rt. Clang patch will be up for
>>>>>>> review
>>>>>>> later today.
>>>>>>>
>>>>>>>  you send me a link?  I managed to miss them.
>>>>>>>>
>>>>>>> So far there is this patch and the instrinsic instruction
>>>>>>> definitions:
>>>>>>> http://reviews.llvm.org/D8877. All patches are necessary for getting
>>>>>>> the
>>>>>>> IC targets and having them displayed by the llvm-profdata.
>>>>>>>
>>>>>> Ok, I'm really not convinced that the instrumentation code needs to be
>>>>>> or should be an intrinsic.  This seems like something which should be
>>>>>> emitted by the frontend and optimized like any other code.  To say
>>>>>> this
>>>>>> a different way, my instrumentation is going to be entirely different
>>>>>> than your instrumentation.
>>>>>>
>>>>>> Having said that, I really don't care about this part of the proposed
>>>>>> changes since they aren't going to impact me at all.  I'm am
>>>>>> specifically not objecting to the changes, just commenting.  :)
>>>>>>
>>>>>>>  I'm assuming this will be some type of per call site metadata?
>>>>>>>>
>>>>>>> We do assign metadata at the indirect call sites. Format looks like
>>>>>>> as
>>>>>>> follows:
>>>>>>>
>>>>>>> !33 = metadata !{metadata !"indirect_call_targets", i64
>>>>>>> <total_exec_count>, metadata !"target_fn1Ã¢Â€Â , i64
>>>>>>> <target_fn1_count>,
>>>>>>> metadata !"target_fn2Ã¢Â€Â , i64 <target_fn2_count>, Ã¢Â€Â¦.}
>>>>>>>
>>>>>>> Currently, we're recording only the top most called five function
>>>>>>> names
>>>>>>> at each indirect call site. Following the string literal
>>>>>>> Ã¢Â€Âœindirect_call_targetsÃ¢Â€Â  are the fields  <total_exec_count>
>>>>>>> i.e. a 64
>>>>>>> bit value for the total number of times the indirect call is executed
>>>>>>> followed by the function names and execution counts of each target.
>>>>>>>
>>>>>> This was the part I was trying to ask about.  I really want to see
>>>>>> where
>>>>>> you're going with this optimization wise.  My naive guess is that this
>>>>>> is going to be slightly off for what you actually want.
>>>>>>
>>>>>> Assuming you're going for profile guided devirtualization (and thus
>>>>>> inlining), being able to check the type of the receiver (as opposed to
>>>>>> the result of the virtual lookup) might be advantageous.  (Or, to say
>>>>>> it
>>>>>> differently, that's what I'm used to seeing.  Your approach might be
>>>>>> completely reasonable, it's just not what I'm used to seeing.)  Have
>>>>>> you
>>>>>> thought about the tradeoffs here?
>>>>>>
>>>>> Not sure if I understood the problem here, however, we're recording
>>>>> both
>>>>> the target address and the addresses/names of the instrumented
>>>>> functions
>>>>> during the execution of the instrumented binary. During profile reading
>>>>> these addresses are used to match the target addresses to corresponding
>>>>> functions. During optimization, target function name should be adequate
>>>>> to
>>>>> retrieve the function type information which then can be used to
>>>>> compare
>>>>> the types of the arguments as well as the return type w/ the call site
>>>>> to
>>>>> ensure the match.
>>>>>
>>>>> To clarify, our interest at this time is to upstream all the
>>>>> infrastructure related changes. The optimizations using the profile
>>>>> data
>>>>> will not be part of our initial set of patches. On the other hand,
>>>>> agreeing on the metadata format at the call sites would be advantageous
>>>>> in
>>>>> terms of supporting multiple profile methods.
>>>>>
>>>>> Thanks,
>>>>> -Betul
>>>>>
>>>> I have the same questions and concerns as Philip. If the goal of this is
>>>> to support devirtualization, this is not how I would expect it to be
>>>> done.
>>>> If you really want something more general, can you provide data to show
>>>> that it is worth the cost? Recording all that information when profiling
>>>> will not be cheap, and it adds significant complexity as well.
>>>>
>>> There was an earlier email of mine that went to the llvm-dev mailing list
>>> discussing about the implementation dependencies and gains/costs.
>>> http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-April/084271.html
>>>
>>  On that thread, you claim an 8% performance boost on spec.  What
>> optimizations did you implement to get that?  Can you outline the
>> structure?
>>
>>
>>>  I also donâ€™t think we should consider taking all of your
>>>> infrastructure
>>>> changes without first seeing at least preliminary versions of the
>>>> optimization patches that demonstrate the benefits.
>>>>
>>>>   Thanks.
>>>>>>> -Betul
>>>>>>>
>>>>>>>
>>>>>>> http://reviews.llvm.org/D8908 <http://reviews.llvm.org/D8908>
>>>>>>>
>>>>>>> EMAIL PREFERENCES
>>>>>>>    http://reviews.llvm.org/settings/panel/emailpreferences/
>>>>>>> <http://reviews.llvm.org/settings/panel/emailpreferences/>
>>>>>>>
>>>>>>
>>>
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150413/d07d0194/attachment.html>