[PATCH] D28964: [PGO] Value profile support for value ranges

Tue Jan 24 11:38:31 PST 2017

On Tue, Jan 24, 2017 at 11:20 AM, Rong Xu <xur at google.com> wrote:

>
>
> On Tue, Jan 24, 2017 at 11:15 AM, Xinliang David Li <davidxl at google.com>
> wrote:
>
>>
>>
>> On Tue, Jan 24, 2017 at 11:13 AM, Rong Xu <xur at google.com> wrote:
>>
>>>
>>>
>>> On Tue, Jan 24, 2017 at 11:02 AM, Xinliang David Li <davidxl at google.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Jan 24, 2017 at 10:57 AM, Rong Xu <xur at google.com> wrote:
>>>>
>>>>> Yes. that's another way to do this. I considered that when I
>>>>> implemented the code. I choose not do because:
>>>>> (1) it makes it the instrumentation code complex and increase the code
>>>>> size as basically you inline the logic to every instrumentation site.
>>>>>
>>>>
>>>> We can filter a lot of memcpy sites with constant sizes, so the number
>>>> of sites remaining should not be large (compared to virtual calls).
>>>>
>>> in real applications, the static call site for memcpy and memset is not
>>> small.
>>>
>>>>
>>>>
>>>>> (2) it affects the static counter allocation. This range profile
>>>>> potentially uses a lot more counters that current indirectly calls.
>>>>>
>>>>
>>>> Why is that?
>>>>
>>> we collect the precise size for small sized memcpy and memset. Most of
>>> the callsite has multiple size values.
>>>
>>
>> My question is how is this affected by the way it is implemented: 1)
>> using API call ; 2) generating inline sequence of interval profiling.
>>
>
> using API, I have the exact count of the range instrumentations (and the
> range) so I can adjust the counter allocation in lowering (different from
> indirect calls)
> On the other hand, if we do the inline sequence in instrumentation, we
> treat them the same as current indirect calls instrumentation.
>
>

Did I miss anything? I do not see any special handling of static counter
allocation in the patches. The default counters per-site is still unchanged?

David

>
>> Anyway, see my previous reply -- I think having a minimal wrapper API in
>> compiler rt is fine.
>>
>> David
>>
>>
>>>
>>>> David
>>>>
>>>>
>>>>
>>>>
>>>>> The advantage of not change compiler_rt is that less information is
>>>>> needed to pass b/w runtime and instrumentation. This especially true is we
>>>>> want to have profile client to control it's own ranges.
>>>>>
>>>>> But I still think the change in compiler_rt is straight-forward way.
>>>>>
>>>>> On Tue, Jan 24, 2017 at 10:33 AM, David Li via Phabricator <
>>>>> reviews at reviews.llvm.org> wrote:
>>>>>
>>>>>> davidxl added a comment.
>>>>>>
>>>>>> In fact, I don't there is a need for compiler_rt change at all for
>>>>>> the interval/range profiling. The compiler_rt API should be kept simple.
>>>>>> In other words, interval profiling should already be 'lowered' into regular
>>>>>> value profiling in the instrumentation lowering phase. The instrumentation
>>>>>> lowering will general inline sequence that collapse ranges into single
>>>>>> value depending on the value profile kind.  Only the
>>>>>> __llvm_profile_instrument_target low level interface is needed.
>>>>>>
>>>>>> The only downside of this approach that we can not override the range
>>>>>> definitions at runtime -- but I don't think that is worth the complexity
>>>>>> introduced.
>>>>>>
>>>>>>
>>>>>> Repository:
>>>>>>   rL LLVM
>>>>>>
>>>>>> https://reviews.llvm.org/D28964
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170124/6c81777f/attachment-0001.html>