[llvm-dev] RFC: Pass to prune redundant profiling instrumentation

Fri Mar 11 11:23:19 PST 2016

On Fri, Mar 11, 2016 at 11:17 AM, Justin Bogner <mail at justinbogner.com>
wrote:

> Xinliang David Li via llvm-dev <llvm-dev at lists.llvm.org> writes:
> > Vedant, Are you concerned about instrumentation run  performance for PGO
> or
> > for coverage testing? For coverage testing, is coverage information
> enough
> > or count information is also needed?  Depending on the answer to the
> > questions, the solution may be very different.
> >
> > If the answer is for PGO, the a much better longer term solution is to
> > migrate to use IR based instrumentation. Not only because IR based
> > instrumentation places counter update 'optimally', the early optimization
> > including pre-inline of tiny funcs is very effective in reducing instr
> > overhead plus the benefit of better profile quality due to context
> > sensitivity. For more details or performance numbers see Rong's RFC about
> > late instrumentation.
> >
> > If the answer is PGO, but for various reasons, the FE based
> instrumentation
> > has to be used, then I think there is another more effective way
> previously
> > suggested by Sean. Basically you can skip single BB functions completely
> > during instrumentation. There are various reasons why profile data for
> > single bb function is not useful:
> > 1) they are usually small and beneficial to be inlined regardless of
> > profile data
> > 2) the BB of the inline instance can get profile data from the caller
> > context
> > 3) the profile data for the out of line single BB func is also useless
> >
> > Rong has some data showing the effectiveness of this method -- not as
> good
> > as the pre-optimization approach, but IIRC also very good.
> >
> > If the answer is for coverage but the actual count value does not really
> > matter, then a more effective way of reducing overhead is to teach the
> > instrumentation lowering pass to lower the counter update into a simple
> > store:
> >
> >   counter_1 = 1;
>
> This won't work. The FE based instrumentation has implicit counters for
> various AST constructs that can only be worked out by doing math with
> the explicit counters. If they don't have actual counts, this doesn't
> work.
>

This depends on how FE picks regions to instrument. The key part is that FE
should not generate region with 'minus' counter expressions, only 'add' is
allowed.

>
> > Such stores from the inline instances of the same func can be easily
> > eliminated. It will also help multithreaded program a lot when there is
> > heavy counter contention.
> >
> > Another benefit is that the size of the counter can be effectively
> reduced
> > to 1  byte instead of 8 byte.
> >
> > The tricky situation is you want coverage data but count per line is also
> > important -- but I wonder having special pass to just handle this
> scenario
> > is worth the effort.
>
> Isn't "coverage with count-per-line" the whole point of a coverage
> feature?
>

The point of coverage testing is to see if there are any lines of the code
not 'executed' at runtime -- in that sense,  count does not really matter.
Whether it executes 1000 times or only 1 time is irrelevant. However it
makes a difference if a line has '0' count.

Note that asan based coverage tool does not do counting either.

>
> It's also convenient to gather data for PGO and coverage in the same
> run, rather than having to build instrumented twice and run the same
> tests/training runs twice. Performance of the FE based instrumentation
> is pretty important for productivity reasons.
>
> > Also a side note, I think longer term we should unify three
> instrumentation
> > mechanism into one: FE based, IR based, and old gcda instrumentation.  IR
> > based is the most efficient implementation -- when combined with gcda
> > profiling runtime, it can be used to replace current gcda profiling which
> > is not efficient. It is also possible to use IR based instr for coverage
> > mapping (with coverage map format mostly unchanged) but main challenge is
> > passing source info etc.
>
> I don't disagree, but I think that getting the same quality of coverage
> data from the IR based profiles as we do from the FE ones is a fairly
> large undertaking.
>

Right --  I am thinking about this and there might be good ways to do it
(e.g, let FE prepare some of the map data and source info filled and
backend can fill in with counter expressions).

thanks,

David

>
> > thanks,
> >
> > David
> >
> >
> > On Thu, Mar 10, 2016 at 7:21 PM, Vedant Kumar via llvm-dev <
> > llvm-dev at lists.llvm.org> wrote:
> >
> >> Hi,
> >>
> >> I'd like to add a new pass to LLVM which removes redundant profile
> counter
> >> updates. The goal is to speed up code coverage testing and profile
> >> generation
> >> for PGO.
> >>
> >> I'm sending this email out to describe my approach, share some early
> >> results,
> >> and gather feedback.
> >>
> >
> > From your example, it seems only one scenario is handled -- local
> function
> > with single callsite ? This seems to be quite narrow in scope. Before
> > pursuing further, it is better to measure the impact of this on larger
> > benchmarks.
> >
> >
> >
> >> thanks
> >> vedant
> >>
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> llvm-dev at lists.llvm.org
> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160311/c1de1093/attachment.html>