[llvm-dev] RFC: Pass to prune redundant profiling instrumentation

Thu Mar 10 21:34:22 PST 2016

Vedant, Are you concerned about instrumentation run  performance for PGO or
for coverage testing? For coverage testing, is coverage information enough
or count information is also needed?  Depending on the answer to the
questions, the solution may be very different.

If the answer is for PGO, the a much better longer term solution is to
migrate to use IR based instrumentation. Not only because IR based
instrumentation places counter update 'optimally', the early optimization
including pre-inline of tiny funcs is very effective in reducing instr
overhead plus the benefit of better profile quality due to context
sensitivity. For more details or performance numbers see Rong's RFC about
late instrumentation.

If the answer is PGO, but for various reasons, the FE based instrumentation
has to be used, then I think there is another more effective way previously
suggested by Sean. Basically you can skip single BB functions completely
during instrumentation. There are various reasons why profile data for
single bb function is not useful:
1) they are usually small and beneficial to be inlined regardless of
profile data
2) the BB of the inline instance can get profile data from the caller
context
3) the profile data for the out of line single BB func is also useless

Rong has some data showing the effectiveness of this method -- not as good
as the pre-optimization approach, but IIRC also very good.

If the answer is for coverage but the actual count value does not really
matter, then a more effective way of reducing overhead is to teach the
instrumentation lowering pass to lower the counter update into a simple
store:

  counter_1 = 1;

Such stores from the inline instances of the same func can be easily
eliminated. It will also help multithreaded program a lot when there is
heavy counter contention.

Another benefit is that the size of the counter can be effectively reduced
to 1  byte instead of 8 byte.

The tricky situation is you want coverage data but count per line is also
important -- but I wonder having special pass to just handle this scenario
is worth the effort.

Also a side note, I think longer term we should unify three instrumentation
mechanism into one: FE based, IR based, and old gcda instrumentation.  IR
based is the most efficient implementation -- when combined with gcda
profiling runtime, it can be used to replace current gcda profiling which
is not efficient. It is also possible to use IR based instr for coverage
mapping (with coverage map format mostly unchanged) but main challenge is
passing source info etc.

thanks,

David

On Thu, Mar 10, 2016 at 7:21 PM, Vedant Kumar via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Hi,
>
> I'd like to add a new pass to LLVM which removes redundant profile counter
> updates. The goal is to speed up code coverage testing and profile
> generation
> for PGO.
>
> I'm sending this email out to describe my approach, share some early
> results,
> and gather feedback.
>

>From your example, it seems only one scenario is handled -- local function
with single callsite ? This seems to be quite narrow in scope. Before
pursuing further, it is better to measure the impact of this on larger
benchmarks.

> thanks
> vedant
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160310/59320d80/attachment.html>