[LLVMdev] IC profiling infrastructure

Xinliang David Li davidxl at google.com
Sat May 16 11:33:32 PDT 2015


I have sent my review comments. I think most of my high level concerns
have been addressed (except for last few minor fix ups).

Justin, do you have a chance to take a look?

thanks,

David

On Wed, May 13, 2015 at 10:49 AM, Betul Buyukkurt <betulb at codeaurora.org> wrote:
>> Xinliang David Li <davidxl at google.com> writes:
>>>> From: <betulb at codeaurora.org>
>>>> Date: Tue, Apr 7, 2015 at 12:44 PM
>>>> Subject: [LLVMdev] IC profiling infrastructure
>>>> To: llvmdev at cs.uiuc.edu
>>>>
>>>>
>>>>
>>>> Hi All,
>>>>
>>>> We had sent out an RFC in October on indirect call target profiling.
>>>> The
>>>> proposal was about profiling target addresses seen at indirect call
>>>> sites.
>>>> Using the profile data we're seeing up to %8 performance improvements
>>>> on
>>>> individual spec benchmarks where indirect call sites are present. We've
>>>> already started uploading our patches to the phabricator. I'm looking
>>>> forward to your reviews and comments on the code and ready to respond
>>>> to
>>>> your design related queries.
>>>>
>>>> There were few questions posted on the RFC that were not responded.
>>>> Here
>>>> are the much delayed comments.
>>>>
>>>
>>> Hi Betul, thank you for your patience.  I have completed initial
>>> comparison with a few alternative value profile designs. My conclusion
>>> is that your proposed approach should well in practice. The study can
>>> be found here:
>>> https://docs.google.com/document/u/1/d/1k-_k_DLFBh8h3XMnPAi6za-XpmjOIPHX_x6UB6PULfw/pub
>>
>> Thanks for looking at this David.
>>
>> Betul: I also have some thoughts on the approach and implementation of
>> this, but haven't had a chance to go over it in detail. I hope to have
>> some feedback for you on all of this sometime next week, and I'll start
>> reviewing the individual patches after that.
>
> Hi All,
>
> I've posted three more patches yesterday. They might be missing some
> cosmetic fixes, but the support for profiling multiple value kinds have
> been added to the readers, writers and runtime. I'd appreciate your
> comments on the CL's.
>
> Thanks,
> -Betul
>
>>
>>>> 1) Added dependencies: Our implementation adds dependency on
>>>> calloc/free
>>>> as we’re generating/maintaining a linked list at run time.
>>>
>>> If it becomes a problem for some, there is a way to handle that -- but
>>> at a cost of more memory required (to be conservative). One of the
>>> good feature of using dynamic memory is that it allows counter array
>>> allocation on the fly which eliminates the need to allocate memory for
>>> lots of cold/unexecuted functions.
>>>
>>>> We also added
>>>> dependency on the usage of mutexes to prevent memory leaks in the case
>>>> multiple threads trying to insert a new target address for the same IC
>>>> site into the linked list. To least impact the performance we only
>>>> added
>>>> mutexes around the pointer assignment and kept any dynamic memory
>>>> allocation/free operations outside of the mutexed code.
>>>
>>> This (using mutexes) should be and can be avoided -- see the above
>>> report.
>>>
>>>>
>>>> 2) Indirect call data being present in sampling profile output: This is
>>>> unfortunately not helping in our case due to perf depending on lbr
>>>> support. To our knowledge lbr support is not present on ARM platforms.
>>>>
>>>
>>> yes.
>>>
>>>> 3) Losing profiling support on targets not supporting malloc/mutexes:
>>>> The
>>>> added dependency on calloc/free/mutexes may perhaps be eliminated
>>>> (although our current solution does not handle this) through having a
>>>> separate run time library for value profiling purposes. Instrumentation
>>>> can link in two run time libraries when value profiling (an instance of
>>>> it
>>>> being indirect call target profiling) is enabled on the command line.
>>>
>>> See above.
>>>
>>>>
>>>> 4) Performance of the instrumented code: Instrumentation with IC
>>>> profiling
>>>> patches resulted in 7% degradation across spec benchmarks at -O2. For
>>>> the
>>>> benchmarks that did not have any IC sites, no performance degradation
>>>> was
>>>> observed. This data is gathered using the ref data set for spec.
>>>>
>>>
>>> I'd like to make the runtime part of the change to be shared and used
>>> as a general purpose value profiler (not just indirect call
>>> promotion), but this can be done as a follow up.
>>>
>>> I will start with some reviews. Hopefully others will help with reviews
>>> too.
>>>
>>> thanks,
>>>
>>> David
>>>
>>>
>>>
>>>> Thanks,
>>>> -Betul Buyukkurt
>>>>
>>>> Qualcomm Innovation Center, Inc.
>>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a
>>>> Linux
>>>> Foundation Collaborative Project
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>
>




More information about the llvm-dev mailing list