[PATCH] Initial code coverage mapping data structures, and reader and writers + C interface for ProfileData library

Fri Jun 27 11:45:15 PDT 2014

----- Original Message -----
> From: "Bob Wilson" <bob.wilson at apple.com>
> To: "Eric Christopher" <echristo at gmail.com>
> Cc: llvm-commits at cs.uiuc.edu
> Sent: Friday, June 27, 2014 12:26:36 PM
> Subject: Re: [PATCH] Initial code coverage mapping data structures,	and reader	and writers + C interface for
> ProfileData library
> 
> To put this in context, it is not a brand-new proposal coming out of
> nowhere. This is a continuation of our work on instrumentation-based
> profiling for PGO. We’ve been very clear all along that a
> significant goal of that work has been to use the same
> instrumentation for both PGO and coverage testing. I spoke about
> this at the 2013 Dev Meeting and it has been explained in some
> detail when we discussed the instrumentation changes for PGO. Until
> now, we’ve been focused on the support for PGO, but Alex is now
> working on completing the support for code coverage.
> 
> Alex already listed several shortcomings of the gcov approach that
> are addressed by his work. Here’s my list, which includes some of
> the things that Alex already mentioned:
> 
> * Gcov basically requires that you compile at -O0. If you’re testing
> something that runs for a long time, that can be a real problem.
> This approach works fine with full optimization (although there is
> still some overhead from the instrumentation).
> 
> * The gcov approach spews files all over the place, making it really
> difficult to manage separate coverage data sets for multiple runs.
> This approach has one data file for each run.
> 
> * The mapping of execution counts to source ranges with gcov is often
> inaccurate. I don’t have specific examples offhand. It tends to get
> confused for statements spread across multiple source lines. For
> example, if you have a function call with the arguments on separate
> lines (and where the arguments are simple and do not have their own
> debug line entries), gcov will end up showing just one of the lines
> as having executed.

I'd like to add that the points above strongly correlate with my own experience with gcov. I strongly support the development of an alternative.

Primarily, I'm concerned with collecting coverage data from cluster systems; perhaps large ones. This comes down to the following requirements:

 1. Don't read from zillions of files, this is slow on shared filesytems

 2. Don't write to zillions of files, this is painfully slow on shared filesystems (and, moreover, when collecting coverage data from multiple simultaneously-running processes, don't have them all race on writing the same output file(s)).

Regarding the second point, this means that either a) multiple simultaneously-running processes can write their output into separate files or, preferably, b) multiple simultaneously-running processes can write their output into separate regions of the same file.

 3. It must work on optimized executables (because users can't afford the CPU hours necessary to run real problems with -O0 builds).

Thanks again,
Hal

> 
> * We’re already using the instrumentation for PGO and this gives us a
> way to see how the profile data maps to the source. Regardless of
> the code coverage tools, we really need this for debugging issues
> with instrumentation-based PGO.
> 
> * LLVM’s gcov-style instrumentation is done in a backend pass, which
> means that it can’t see anything except what’s in the IR. Because
> this new approach is based on instrumentation in the front-end, we
> can correctly report un-covered code that is never emitted as IR
> because the front-end sees that it is dead. We also have the
> capability of reporting code that is #if’ed out by the preprocessor,
> although it’s not yet clear whether that will be useful.
> 
> Alex has a lot of code already written for this. It doesn’t make
> sense to just drop all the patches on the list at once, since they
> will be hard to review like that.
> 
> > On Jun 26, 2014, at 12:27 PM, Eric Christopher <echristo at gmail.com>
> > wrote:
> > 
> > An analysis of what problems you're trying to solve with your new
> > work
> > would be good as well. I'm still not sure what problems you're
> > trying
> > to solve and how what you're doing solves anything. Use cases that
> > describe what we can't do now and we'll be able to do in the future
> > would also be enlightening.
> > 
> > -eric
> > 
> > On Thu, Jun 26, 2014 at 11:50 AM, Alex L <arphaman at gmail.com>
> > wrote:
> >> I understand that by itself this patch may not be that meaningful,
> >> I just
> >> wanted to get something out there asap. I will send the initial
> >> patches for
> >> clang and llvm-cov in the next couple of days.
> >> 
> >> 
> >> 2014-06-26 11:40 GMT-07:00 Philip Reames
> >> <listmail at philipreames.com>:
> >> 
> >>> 
> >>> On 06/26/2014 11:23 AM, Hal Finkel wrote:
> >>>> 
> >>>> ----- Original Message -----
> >>>>> 
> >>>>> From: "Philip Reames" <listmail at philipreames.com>
> >>>>> To: llvm-commits at cs.uiuc.edu
> >>>>> Sent: Thursday, June 26, 2014 12:51:43 PM
> >>>>> Subject: Re: [PATCH] Initial code coverage mapping data
> >>>>> structures, and
> >>>>> reader  and writers + C interface for
> >>>>> ProfileData library
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> On 06/26/2014 09:43 AM, Alex L wrote:
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> 2014-06-25 17:57 GMT-07:00 Eric Christopher <
> >>>>> echristo at gmail.com > :
> >>>>> 
> >>>>> 
> >>>>> So, any reason why we want the new coverage system? You didn't
> >>>>> seem
> >>>>> to
> >>>>> highlight what you saw wrong with gcov or the way it works.
> >>>>> 
> >>>>> I saw your original mail and it didn't have a lot of motivation
> >>>>> here.
> >>>>> 
> >>>>> Thanks.
> >>>>> 
> >>>>> -eric
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> I am not very familiar with GCOV, but the primary motivation
> >>>>> for the
> >>>>> new coverage system is to provide very accurate execution
> >>>>> counts for
> >>>>> a program. This would enable us to provide modern, high quality
> >>>>> tools for code coverage analysis.
> >>>>> 
> >>>>> By very accurate I mean that instead of reasoning about code
> >>>>> coverage
> >>>>> for basic blocks, branches and even lines, we would be able to
> >>>>> reason about code coverage for the regions of source code that
> >>>>> resemble the corresponding AST. This would enable the coverage
> >>>>> tool
> >>>>> to locate and mark the exact regions of code that weren't
> >>>>> executed,
> >>>>> even if the IR for those regions was transformed by the
> >>>>> optimizer. I
> >>>>> think that GCOV fails to provide coverage for certain
> >>>>> constructs
> >>>>> when optimizations are enabled. Also, I think that GCOV doesn't
> >>>>> really show good coverage for the lines that have multiple
> >>>>> regions
> >>>>> with different execution counts, and the new system will enable
> >>>>> us
> >>>>> to create a tool which will have a better way to deal with this
> >>>>> particular situation.
> >>>>> 
> >>>>> 
> >>>>> Also, the GCOV way produces separate mapping files and counter
> >>>>> files
> >>>>> for each source file/object file, which can be somewhat
> >>>>> inconvenient. In the new system we pack the mapping data into
> >>>>> the
> >>>>> generated IR and allow it to be merged by the linker, and as a
> >>>>> result of that all our mapping information is embedded inside
> >>>>> the
> >>>>> program's executable.
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> The new coverage tool will be able to provide a more
> >>>>> interactive
> >>>>> experience as well, by showing reports or code coverage only
> >>>>> for
> >>>>> selected items like certain functions, classes, etc.
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> Also, this new coverage system will provide a library which
> >>>>> various
> >>>>> code coverage tools can use to make coverage reports without
> >>>>> the
> >>>>> need to parse the output of GCOV.
> >>>>> I find nothing in the above explanation to justify including
> >>>>> this
> >>>>> code *in LLVM*. Such a tool may be useful, but this really
> >>>>> sounds
> >>>>> like an independent project (i.e. "replace gcov").
> >>>>> 
> >>>>> Glancing through the code, I also see no real interaction with
> >>>>> LLVM.
> >>>>> This really seems like an independent profiling library which
> >>>>> could
> >>>>> be used to provide profiling data to LLVM, but is otherwise
> >>>>> unrelated. Correct me if I'm wrong here - I did a *very* quick
> >>>>> scan.
> >>>>> 
> >>>>> Your post about this topic on llvm-dev has not generated any
> >>>>> consensus. If anything, there seems to be an active disinterest
> >>>>> in
> >>>>> your proposal.
> >>>> 
> >>>> I'm not sure what you mean by "has not generated any consensus",
> >>>> I saw no
> >>>> discussion at all. That having been said, I know that a number
> >>>> of community
> >>>> members care a lot about coverage quantification (including me),
> >>>> and the
> >>>> gcov zillions-of-files approach clearly does not scale. Really,
> >>>> getting
> >>>> feedback on long RFCs is not easy, and I'd draw no inference
> >>>> from the lack
> >>>> of response to date.
> >>> 
> >>> To be clear, I am not opposed to supporting profiling.  I'm in
> >>> fact quite
> >>> in support of the overall objective.  It's simply that *this
> >>> patch* *at the
> >>> current time* doesn't seem ready.
> >>> 
> >>> I also wonder if things like profile format readers need to be
> >>> part of
> >>> LLVM at all.  Why isn't this handled entirely by the frontend or
> >>> a separate
> >>> tool?  We can already represent profiling information in the IR.
> >>> (Admittedly, in limited ways.  But we should fix that!)  I could
> >>> even see
> >>> having a collection of profile format readers being it's own sub
> >>> project.
> >>> 
> >>>> 
> >>>> That having been said, I'm not sure what to think about this
> >>>> patch. I
> >>>> think it will be easier to review once we see the code that uses
> >>>> it.
> >>> 
> >>> Agreed.
> >>> 
> >>> 
> >>>> 
> >>>>  -Hal
> >>>> 
> >>>>> Given the above, I would oppose the inclusion of this change
> >>>>> set.
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> On Wed, Jun 25, 2014 at 4:32 PM, Alex L < arphaman at gmail.com >
> >>>>> wrote:
> >>>>>> 
> >>>>>> Hi everyone,
> >>>>>> This is a first patch that implements the data structures,
> >>>>>> readers
> >>>>>> and
> >>>>>> writers used by the new code coverage system. I added the new
> >>>>>> code
> >>>>>> to the
> >>>>>> ProfileData library. I also added a very minimal C api for the
> >>>>>> ProfileData
> >>>>>> library.
> >>>>>> 
> >>>>>> http://reviews.llvm.org/D4301
> >>>>>> 
> >>>>>> 
> >>>>>> 
> >>>>>> _______________________________________________
> >>>>>> llvm-commits mailing list
> >>>>>> llvm-commits at cs.uiuc.edu
> >>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >>>>>> 
> >>>>> 
> >>>>> 
> >>>>> _______________________________________________
> >>>>> llvm-commits mailing list llvm-commits at cs.uiuc.edu
> >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >>>>> 
> >>>>> _______________________________________________
> >>>>> llvm-commits mailing list
> >>>>> llvm-commits at cs.uiuc.edu
> >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >>>>> 
> >>> 
> >>> _______________________________________________
> >>> llvm-commits mailing list
> >>> llvm-commits at cs.uiuc.edu
> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >> 
> >> 
> >> 
> >> _______________________________________________
> >> llvm-commits mailing list
> >> llvm-commits at cs.uiuc.edu
> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >> 
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory