[PATCH] Initial code coverage mapping data structures, and reader and writers + C interface for ProfileData library

Fri Jun 27 10:26:36 PDT 2014

To put this in context, it is not a brand-new proposal coming out of nowhere. This is a continuation of our work on instrumentation-based profiling for PGO. We’ve been very clear all along that a significant goal of that work has been to use the same instrumentation for both PGO and coverage testing. I spoke about this at the 2013 Dev Meeting and it has been explained in some detail when we discussed the instrumentation changes for PGO. Until now, we’ve been focused on the support for PGO, but Alex is now working on completing the support for code coverage.

Alex already listed several shortcomings of the gcov approach that are addressed by his work. Here’s my list, which includes some of the things that Alex already mentioned:

* Gcov basically requires that you compile at -O0. If you’re testing something that runs for a long time, that can be a real problem. This approach works fine with full optimization (although there is still some overhead from the instrumentation).

* The gcov approach spews files all over the place, making it really difficult to manage separate coverage data sets for multiple runs. This approach has one data file for each run.

* The mapping of execution counts to source ranges with gcov is often inaccurate. I don’t have specific examples offhand. It tends to get confused for statements spread across multiple source lines. For example, if you have a function call with the arguments on separate lines (and where the arguments are simple and do not have their own debug line entries), gcov will end up showing just one of the lines as having executed.

* We’re already using the instrumentation for PGO and this gives us a way to see how the profile data maps to the source. Regardless of the code coverage tools, we really need this for debugging issues with instrumentation-based PGO.

* LLVM’s gcov-style instrumentation is done in a backend pass, which means that it can’t see anything except what’s in the IR. Because this new approach is based on instrumentation in the front-end, we can correctly report un-covered code that is never emitted as IR because the front-end sees that it is dead. We also have the capability of reporting code that is #if’ed out by the preprocessor, although it’s not yet clear whether that will be useful.

Alex has a lot of code already written for this. It doesn’t make sense to just drop all the patches on the list at once, since they will be hard to review like that.

> On Jun 26, 2014, at 12:27 PM, Eric Christopher <echristo at gmail.com> wrote:
> 
> An analysis of what problems you're trying to solve with your new work
> would be good as well. I'm still not sure what problems you're trying
> to solve and how what you're doing solves anything. Use cases that
> describe what we can't do now and we'll be able to do in the future
> would also be enlightening.
> 
> -eric
> 
> On Thu, Jun 26, 2014 at 11:50 AM, Alex L <arphaman at gmail.com> wrote:
>> I understand that by itself this patch may not be that meaningful, I just
>> wanted to get something out there asap. I will send the initial patches for
>> clang and llvm-cov in the next couple of days.
>> 
>> 
>> 2014-06-26 11:40 GMT-07:00 Philip Reames <listmail at philipreames.com>:
>> 
>>> 
>>> On 06/26/2014 11:23 AM, Hal Finkel wrote:
>>>> 
>>>> ----- Original Message -----
>>>>> 
>>>>> From: "Philip Reames" <listmail at philipreames.com>
>>>>> To: llvm-commits at cs.uiuc.edu
>>>>> Sent: Thursday, June 26, 2014 12:51:43 PM
>>>>> Subject: Re: [PATCH] Initial code coverage mapping data structures, and
>>>>> reader  and writers + C interface for
>>>>> ProfileData library
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On 06/26/2014 09:43 AM, Alex L wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 2014-06-25 17:57 GMT-07:00 Eric Christopher < echristo at gmail.com > :
>>>>> 
>>>>> 
>>>>> So, any reason why we want the new coverage system? You didn't seem
>>>>> to
>>>>> highlight what you saw wrong with gcov or the way it works.
>>>>> 
>>>>> I saw your original mail and it didn't have a lot of motivation here.
>>>>> 
>>>>> Thanks.
>>>>> 
>>>>> -eric
>>>>> 
>>>>> 
>>>>> 
>>>>> I am not very familiar with GCOV, but the primary motivation for the
>>>>> new coverage system is to provide very accurate execution counts for
>>>>> a program. This would enable us to provide modern, high quality
>>>>> tools for code coverage analysis.
>>>>> 
>>>>> By very accurate I mean that instead of reasoning about code coverage
>>>>> for basic blocks, branches and even lines, we would be able to
>>>>> reason about code coverage for the regions of source code that
>>>>> resemble the corresponding AST. This would enable the coverage tool
>>>>> to locate and mark the exact regions of code that weren't executed,
>>>>> even if the IR for those regions was transformed by the optimizer. I
>>>>> think that GCOV fails to provide coverage for certain constructs
>>>>> when optimizations are enabled. Also, I think that GCOV doesn't
>>>>> really show good coverage for the lines that have multiple regions
>>>>> with different execution counts, and the new system will enable us
>>>>> to create a tool which will have a better way to deal with this
>>>>> particular situation.
>>>>> 
>>>>> 
>>>>> Also, the GCOV way produces separate mapping files and counter files
>>>>> for each source file/object file, which can be somewhat
>>>>> inconvenient. In the new system we pack the mapping data into the
>>>>> generated IR and allow it to be merged by the linker, and as a
>>>>> result of that all our mapping information is embedded inside the
>>>>> program's executable.
>>>>> 
>>>>> 
>>>>> 
>>>>> The new coverage tool will be able to provide a more interactive
>>>>> experience as well, by showing reports or code coverage only for
>>>>> selected items like certain functions, classes, etc.
>>>>> 
>>>>> 
>>>>> 
>>>>> Also, this new coverage system will provide a library which various
>>>>> code coverage tools can use to make coverage reports without the
>>>>> need to parse the output of GCOV.
>>>>> I find nothing in the above explanation to justify including this
>>>>> code *in LLVM*. Such a tool may be useful, but this really sounds
>>>>> like an independent project (i.e. "replace gcov").
>>>>> 
>>>>> Glancing through the code, I also see no real interaction with LLVM.
>>>>> This really seems like an independent profiling library which could
>>>>> be used to provide profiling data to LLVM, but is otherwise
>>>>> unrelated. Correct me if I'm wrong here - I did a *very* quick scan.
>>>>> 
>>>>> Your post about this topic on llvm-dev has not generated any
>>>>> consensus. If anything, there seems to be an active disinterest in
>>>>> your proposal.
>>>> 
>>>> I'm not sure what you mean by "has not generated any consensus", I saw no
>>>> discussion at all. That having been said, I know that a number of community
>>>> members care a lot about coverage quantification (including me), and the
>>>> gcov zillions-of-files approach clearly does not scale. Really, getting
>>>> feedback on long RFCs is not easy, and I'd draw no inference from the lack
>>>> of response to date.
>>> 
>>> To be clear, I am not opposed to supporting profiling.  I'm in fact quite
>>> in support of the overall objective.  It's simply that *this patch* *at the
>>> current time* doesn't seem ready.
>>> 
>>> I also wonder if things like profile format readers need to be part of
>>> LLVM at all.  Why isn't this handled entirely by the frontend or a separate
>>> tool?  We can already represent profiling information in the IR.
>>> (Admittedly, in limited ways.  But we should fix that!)  I could even see
>>> having a collection of profile format readers being it's own sub project.
>>> 
>>>> 
>>>> That having been said, I'm not sure what to think about this patch. I
>>>> think it will be easier to review once we see the code that uses it.
>>> 
>>> Agreed.
>>> 
>>> 
>>>> 
>>>>  -Hal
>>>> 
>>>>> Given the above, I would oppose the inclusion of this change set.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On Wed, Jun 25, 2014 at 4:32 PM, Alex L < arphaman at gmail.com > wrote:
>>>>>> 
>>>>>> Hi everyone,
>>>>>> This is a first patch that implements the data structures, readers
>>>>>> and
>>>>>> writers used by the new code coverage system. I added the new code
>>>>>> to the
>>>>>> ProfileData library. I also added a very minimal C api for the
>>>>>> ProfileData
>>>>>> library.
>>>>>> 
>>>>>> http://reviews.llvm.org/D4301
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> llvm-commits mailing list
>>>>>> llvm-commits at cs.uiuc.edu
>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> llvm-commits mailing list llvm-commits at cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>> 
>>>>> _______________________________________________
>>>>> llvm-commits mailing list
>>>>> llvm-commits at cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>> 
>>> 
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>> 
>> 
>> 
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits