[llvm-dev] RFC: Pass to prune redundant profiling instrumentation

Justin Bogner via llvm-dev llvm-dev at lists.llvm.org
Fri Mar 11 11:17:37 PST 2016


Xinliang David Li via llvm-dev <llvm-dev at lists.llvm.org> writes:
> Vedant, Are you concerned about instrumentation run  performance for PGO or
> for coverage testing? For coverage testing, is coverage information enough
> or count information is also needed?  Depending on the answer to the
> questions, the solution may be very different.
>
> If the answer is for PGO, the a much better longer term solution is to
> migrate to use IR based instrumentation. Not only because IR based
> instrumentation places counter update 'optimally', the early optimization
> including pre-inline of tiny funcs is very effective in reducing instr
> overhead plus the benefit of better profile quality due to context
> sensitivity. For more details or performance numbers see Rong's RFC about
> late instrumentation.
>
> If the answer is PGO, but for various reasons, the FE based instrumentation
> has to be used, then I think there is another more effective way previously
> suggested by Sean. Basically you can skip single BB functions completely
> during instrumentation. There are various reasons why profile data for
> single bb function is not useful:
> 1) they are usually small and beneficial to be inlined regardless of
> profile data
> 2) the BB of the inline instance can get profile data from the caller
> context
> 3) the profile data for the out of line single BB func is also useless
>
> Rong has some data showing the effectiveness of this method -- not as good
> as the pre-optimization approach, but IIRC also very good.
>
> If the answer is for coverage but the actual count value does not really
> matter, then a more effective way of reducing overhead is to teach the
> instrumentation lowering pass to lower the counter update into a simple
> store:
>
>   counter_1 = 1;

This won't work. The FE based instrumentation has implicit counters for
various AST constructs that can only be worked out by doing math with
the explicit counters. If they don't have actual counts, this doesn't
work.

> Such stores from the inline instances of the same func can be easily
> eliminated. It will also help multithreaded program a lot when there is
> heavy counter contention.
>
> Another benefit is that the size of the counter can be effectively reduced
> to 1  byte instead of 8 byte.
>
> The tricky situation is you want coverage data but count per line is also
> important -- but I wonder having special pass to just handle this scenario
> is worth the effort.

Isn't "coverage with count-per-line" the whole point of a coverage
feature?

It's also convenient to gather data for PGO and coverage in the same
run, rather than having to build instrumented twice and run the same
tests/training runs twice. Performance of the FE based instrumentation
is pretty important for productivity reasons.

> Also a side note, I think longer term we should unify three instrumentation
> mechanism into one: FE based, IR based, and old gcda instrumentation.  IR
> based is the most efficient implementation -- when combined with gcda
> profiling runtime, it can be used to replace current gcda profiling which
> is not efficient. It is also possible to use IR based instr for coverage
> mapping (with coverage map format mostly unchanged) but main challenge is
> passing source info etc.

I don't disagree, but I think that getting the same quality of coverage
data from the IR based profiles as we do from the FE ones is a fairly
large undertaking.

> thanks,
>
> David
>
>
> On Thu, Mar 10, 2016 at 7:21 PM, Vedant Kumar via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hi,
>>
>> I'd like to add a new pass to LLVM which removes redundant profile counter
>> updates. The goal is to speed up code coverage testing and profile
>> generation
>> for PGO.
>>
>> I'm sending this email out to describe my approach, share some early
>> results,
>> and gather feedback.
>>
>
> From your example, it seems only one scenario is handled -- local function
> with single callsite ? This seems to be quite narrow in scope. Before
> pursuing further, it is better to measure the impact of this on larger
> benchmarks.
>
>
>
>> thanks
>> vedant
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


More information about the llvm-dev mailing list