[PATCH] Indirect call target profiling related profile reader/writer changes

Mon Apr 13 11:02:48 PDT 2015

On 04/13/2015 10:57 AM, betulb at codeaurora.org wrote:
>>> On Apr 10, 2015, at 9:25 AM, betulb at codeaurora.org wrote:
>>>
>>>> On 04/09/2015 11:06 AM, Betul Buyukkurt wrote:
>>>>> In http://reviews.llvm.org/D8908#153838, @reames wrote:
>>>>>
>>>>>> Have the IR level construct patches made it up for review?  If so,
>>>>>> can
>>>>> So far I've posted two patches. These two patches should apply cleanly
>>>>> to the tip, working with the present profile infrastructure. The next
>>>>> set of patches will be the enabler ones: i.e. three more patches one
>>>>> for
>>>>> each of clang, llvm and compiler-rt. Clang patch will be up for review
>>>>> later today.
>>>>>
>>>>>> you send me a link?  I managed to miss them.
>>>>> So far there is this patch and the instrinsic instruction definitions:
>>>>> http://reviews.llvm.org/D8877. All patches are necessary for getting
>>>>> the
>>>>> IC targets and having them displayed by the llvm-profdata.
>>>> Ok, I'm really not convinced that the instrumentation code needs to be
>>>> or should be an intrinsic.  This seems like something which should be
>>>> emitted by the frontend and optimized like any other code.  To say this
>>>> a different way, my instrumentation is going to be entirely different
>>>> than your instrumentation.
>>>>
>>>> Having said that, I really don't care about this part of the proposed
>>>> changes since they aren't going to impact me at all.  I'm am
>>>> specifically not objecting to the changes, just commenting.  :)
>>>>>> I'm assuming this will be some type of per call site metadata?
>>>>> We do assign metadata at the indirect call sites. Format looks like as
>>>>> follows:
>>>>>
>>>>> !33 = metadata !{metadata !"indirect_call_targets", i64
>>>>> <total_exec_count>, metadata !"target_fn1â, i64
>>>>> <target_fn1_count>,
>>>>> metadata !"target_fn2â, i64 <target_fn2_count>, â¦.}
>>>>>
>>>>> Currently, we're recording only the top most called five function
>>>>> names
>>>>> at each indirect call site. Following the string literal
>>>>> âindirect_call_targetsâ are the fields  <total_exec_count>
>>>>> i.e. a 64
>>>>> bit value for the total number of times the indirect call is executed
>>>>> followed by the function names and execution counts of each target.
>>>> This was the part I was trying to ask about.  I really want to see
>>>> where
>>>> you're going with this optimization wise.  My naive guess is that this
>>>> is going to be slightly off for what you actually want.
>>>>
>>>> Assuming you're going for profile guided devirtualization (and thus
>>>> inlining), being able to check the type of the receiver (as opposed to
>>>> the result of the virtual lookup) might be advantageous.  (Or, to say
>>>> it
>>>> differently, that's what I'm used to seeing.  Your approach might be
>>>> completely reasonable, it's just not what I'm used to seeing.)  Have
>>>> you
>>>> thought about the tradeoffs here?
>>> Not sure if I understood the problem here, however, we're recording both
>>> the target address and the addresses/names of the instrumented functions
>>> during the execution of the instrumented binary. During profile reading
>>> these addresses are used to match the target addresses to corresponding
>>> functions. During optimization, target function name should be adequate
>>> to
>>> retrieve the function type information which then can be used to compare
>>> the types of the arguments as well as the return type w/ the call site
>>> to
>>> ensure the match.
>>>
>>> To clarify, our interest at this time is to upstream all the
>>> infrastructure related changes. The optimizations using the profile data
>>> will not be part of our initial set of patches. On the other hand,
>>> agreeing on the metadata format at the call sites would be advantageous
>>> in
>>> terms of supporting multiple profile methods.
>>>
>>> Thanks,
>>> -Betul
>> I have the same questions and concerns as Philip. If the goal of this is
>> to support devirtualization, this is not how I would expect it to be done.
>> If you really want something more general, can you provide data to show
>> that it is worth the cost? Recording all that information when profiling
>> will not be cheap, and it adds significant complexity as well.
> There was an earlier email of mine that went to the llvm-dev mailing list
> discussing about the implementation dependencies and gains/costs.
> http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-April/084271.html
On that thread, you claim an 8% performance boost on spec.  What 
optimizations did you implement to get that?  Can you outline the structure?
>
>> I also don’t think we should consider taking all of your infrastructure
>> changes without first seeing at least preliminary versions of the
>> optimization patches that demonstrate the benefits.
>>
>>>>> Thanks.
>>>>> -Betul
>>>>>
>>>>>
>>>>> http://reviews.llvm.org/D8908 <http://reviews.llvm.org/D8908>
>>>>>
>>>>> EMAIL PREFERENCES
>>>>>    http://reviews.llvm.org/settings/panel/emailpreferences/
>>>>> <http://reviews.llvm.org/settings/panel/emailpreferences/>
>