[llvm-dev] RFC: Pass to prune redundant profiling instrumentation

Fri Mar 11 11:28:22 PST 2016

> On Mar 11, 2016, at 11:17 AM, Justin Bogner via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
> Xinliang David Li via llvm-dev <llvm-dev at lists.llvm.org> writes:
>> Vedant, Are you concerned about instrumentation run  performance for PGO or
>> for coverage testing? For coverage testing, is coverage information enough
>> or count information is also needed?  Depending on the answer to the
>> questions, the solution may be very different.
>> 
>> If the answer is for PGO, the a much better longer term solution is to
>> migrate to use IR based instrumentation. Not only because IR based
>> instrumentation places counter update 'optimally', the early optimization
>> including pre-inline of tiny funcs is very effective in reducing instr
>> overhead plus the benefit of better profile quality due to context
>> sensitivity. For more details or performance numbers see Rong's RFC about
>> late instrumentation.
>> 
>> If the answer is PGO, but for various reasons, the FE based instrumentation
>> has to be used, then I think there is another more effective way previously
>> suggested by Sean. Basically you can skip single BB functions completely
>> during instrumentation. There are various reasons why profile data for
>> single bb function is not useful:
>> 1) they are usually small and beneficial to be inlined regardless of
>> profile data
>> 2) the BB of the inline instance can get profile data from the caller
>> context
>> 3) the profile data for the out of line single BB func is also useless
>> 
>> Rong has some data showing the effectiveness of this method -- not as good
>> as the pre-optimization approach, but IIRC also very good.
>> 
>> If the answer is for coverage but the actual count value does not really
>> matter, then a more effective way of reducing overhead is to teach the
>> instrumentation lowering pass to lower the counter update into a simple
>> store:
>> 
>>  counter_1 = 1;
> 
> This won't work. The FE based instrumentation has implicit counters for
> various AST constructs that can only be worked out by doing math with
> the explicit counters. If they don't have actual counts, this doesn't
> work.
> 
>> Such stores from the inline instances of the same func can be easily
>> eliminated. It will also help multithreaded program a lot when there is
>> heavy counter contention.
>> 
>> Another benefit is that the size of the counter can be effectively reduced
>> to 1  byte instead of 8 byte.
>> 
>> The tricky situation is you want coverage data but count per line is also
>> important -- but I wonder having special pass to just handle this scenario
>> is worth the effort.
> 
> Isn't "coverage with count-per-line" the whole point of a coverage
> feature?
> 
> It's also convenient to gather data for PGO and coverage in the same
> run, rather than having to build instrumented twice and run the same
> tests/training runs twice. Performance of the FE based instrumentation
> is pretty important for productivity reasons.

I wonder about that.

I'm interested by the coverage of a test-suite, but I may not use this as a training set for PGO:  why would I train on torture-tests that are exercising corner cases of the code base (that I want to be clearly "cold" in the PGO profile).

-- 
Mehdi

> 
>> Also a side note, I think longer term we should unify three instrumentation
>> mechanism into one: FE based, IR based, and old gcda instrumentation.  IR
>> based is the most efficient implementation -- when combined with gcda
>> profiling runtime, it can be used to replace current gcda profiling which
>> is not efficient. It is also possible to use IR based instr for coverage
>> mapping (with coverage map format mostly unchanged) but main challenge is
>> passing source info etc.
> 
> I don't disagree, but I think that getting the same quality of coverage
> data from the IR based profiles as we do from the FE ones is a fairly
> large undertaking.
> 
>> thanks,
>> 
>> David
>> 
>> 
>> On Thu, Mar 10, 2016 at 7:21 PM, Vedant Kumar via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>> 
>>> Hi,
>>> 
>>> I'd like to add a new pass to LLVM which removes redundant profile counter
>>> updates. The goal is to speed up code coverage testing and profile
>>> generation
>>> for PGO.
>>> 
>>> I'm sending this email out to describe my approach, share some early
>>> results,
>>> and gather feedback.
>>> 
>> 
>> From your example, it seems only one scenario is handled -- local function
>> with single callsite ? This seems to be quite narrow in scope. Before
>> pursuing further, it is better to measure the impact of this on larger
>> benchmarks.
>> 
>> 
>> 
>>> thanks
>>> vedant
>>> 
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev