[LLVMdev] IC profiling infrastructure

betulb at codeaurora.org betulb at codeaurora.org
Wed Apr 29 10:19:29 PDT 2015


>> From: <betulb at codeaurora.org>
>> Date: Tue, Apr 7, 2015 at 12:44 PM
>> Subject: [LLVMdev] IC profiling infrastructure
>> To: llvmdev at cs.uiuc.edu
>>
>>
>>
>> Hi All,
>>
>> We had sent out an RFC in October on indirect call target profiling. The
>> proposal was about profiling target addresses seen at indirect call
>> sites.
>> Using the profile data we're seeing up to %8 performance improvements on
>> individual spec benchmarks where indirect call sites are present. We've
>> already started uploading our patches to the phabricator. I'm looking
>> forward to your reviews and comments on the code and ready to respond to
>> your design related queries.
>>
>> There were few questions posted on the RFC that were not responded. Here
>> are the much delayed comments.
>>
>
> Hi Betul, thank you for your patience.  I have completed initial
> comparison with a few alternative value profile designs. My conclusion
> is that your proposed approach should well in practice. The study can
> be found here:
> https://docs.google.com/document/u/1/d/1k-_k_DLFBh8h3XMnPAi6za-XpmjOIPHX_x6UB6PULfw/pub

Hi David,

Thanks for the detailed report and working on this. We really appreciate
the feedback. We're looking forward to the comments and up streaming the
changes.

>
>> 1) Added dependencies: Our implementation adds dependency on calloc/free
>> as we’re generating/maintaining a linked list at run time.
>
> If it becomes a problem for some, there is a way to handle that -- but
> at a cost of more memory required (to be conservative). One of the
> good feature of using dynamic memory is that it allows counter array
> allocation on the fly which eliminates the need to allocate memory for
> lots of cold/unexecuted functions.
>
>> We also added
>> dependency on the usage of mutexes to prevent memory leaks in the case
>> multiple threads trying to insert a new target address for the same IC
>> site into the linked list. To least impact the performance we only added
>> mutexes around the pointer assignment and kept any dynamic memory
>> allocation/free operations outside of the mutexed code.
>
> This (using mutexes) should be and can be avoided -- see the above report.

I did read your report carefully. You suggested use of atomic linked list
link update to avoid mutexes. We have a runtime written in C. So I was not
sure if introducing C++11 features like std::atomic was OK or not. Also
some operations can be performed atomically on x86 platforms (based on
data being aligned at various bit length/cache line boundaries) but arm or
other platforms would not support that.

>>
>> 2) Indirect call data being present in sampling profile output: This is
>> unfortunately not helping in our case due to perf depending on lbr
>> support. To our knowledge lbr support is not present on ARM platforms.
>>
>
> yes.
>
>> 3) Losing profiling support on targets not supporting malloc/mutexes:
>> The
>> added dependency on calloc/free/mutexes may perhaps be eliminated
>> (although our current solution does not handle this) through having a
>> separate run time library for value profiling purposes. Instrumentation
>> can link in two run time libraries when value profiling (an instance of
>> it
>> being indirect call target profiling) is enabled on the command line.
>
> See above.
>
>>
>> 4) Performance of the instrumented code: Instrumentation with IC
>> profiling
>> patches resulted in 7% degradation across spec benchmarks at -O2. For
>> the
>> benchmarks that did not have any IC sites, no performance degradation
>> was
>> observed. This data is gathered using the ref data set for spec.
>>
>
> I'd like to make the runtime part of the change to be shared and used
> as a general purpose value profiler (not just indirect call
> promotion), but this can be done as a follow up.

My understanding of your analysis was that it only covered the run-time
library performance and not really looked into if instrumentation is
really enabled at the right sites.

> I will start with some reviews. Hopefully others will help with reviews
> too.

Thanks very much. We'll be responding to the reviews diligently.

> thanks,
>
> David
>
>
>
>> Thanks,
>> -Betul Buyukkurt
>>
>> Qualcomm Innovation Center, Inc.
>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a
>> Linux
>> Foundation Collaborative Project
>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>






More information about the llvm-dev mailing list