[PATCH] Initial code coverage mapping data, reader, writer + C interface for ProfileData
Alex L
arphaman at gmail.com
Thu Jul 17 11:59:02 PDT 2014
2014-07-17 10:17 GMT-07:00 Alex L <arphaman at gmail.com>:
> Thanks!
>
>
> 2014-07-16 22:38 GMT-07:00 Bob Wilson <bob.wilson at apple.com>:
>
> The documentation is going to need more work. Let’s split that out into a
>> separate patch. At least it helps me understand how the reader and writer
>> work.
>>
>> > On Jul 9, 2014, at 4:39 PM, Alex Lorenz <arphaman at gmail.com> wrote:
>> >
>> > The updated version of the patch that fixes various typos, adds format
>> version to the coverage mapping format, changes the storage of filenames to
>> once per tu, and adds document that describes the actual coverage mapping
>> format.
>> >
>> > http://reviews.llvm.org/D4301
>>
>> > Index: include/llvm/ProfileData/CoverageMapping.h
>> > ===================================================================
>> > --- /dev/null
>> > +++ include/llvm/ProfileData/CoverageMapping.h
>> > @@ -0,0 +1,196 @@
>> …
>> > +/// \brief A Counter expression builder is used to construct the
>> > +/// counter expressions. It avoids unecessary duplication
>> > +/// and simplifies algebraic expression.
>> > +class CounterExpressionBuilder {
>>
>> I had commented previously that “expression” should be plural in the
>> comment here. I had not noticed that there were two of them. You changed
>> the first but not the second.
>>
>> …
>>
>> > +/// \brief A Counter mapping region associates a source range with
>> > +/// a specific counter.
>> > +struct CounterMappingRegion {
>> > + enum RegionKind {
>> > + /// \brief A CodeRegion associates some code with a counter
>> > + CodeRegion,
>> > +
>> > + /// \brief An ExpansionRegion represents
>> > + /// a file expansion region that associates
>> > + /// a source range with the expansion of a
>> > + /// virtual source file, such as for a macro instantiation
>> > + /// or #include file.
>> > + ExpansionRegion,
>>
>> My previous review suggested that you reflow comments to 80 columns. For
>> many patches that is about fixing 80 column violations, but in this patch
>> there are a number of comments like this one with very short lines. I would
>> prefer that you take better advantage of the space and fit those on fewer
>> lines.
>>
>> …
>>
>> > Index: include/llvm/ProfileData/CoverageMappingReader.h
>> > ===================================================================
>> > --- /dev/null
>> > +++ include/llvm/ProfileData/CoverageMappingReader.h
>>
>> …
>>
>> > +/// \brief Base class for the raw coverage mapping and filenames data
>> readers.
>> > +class RawCoverageReader {
>> > +protected:
>> > + StringRef Data;
>> > +
>> > + /// \brief Return the error code.
>> > + std::error_code error(std::error_code EC) { return EC; }
>> > +
>> > + /// \brief Clear the current error code and return a successful one.
>> > + std::error_code success() { return error(instrprof_error::success); }
>>
>> Unlike the error() and success() functions in InstrProfReader and
>> ObjectFileCoverageMappingReader, these don’t keep track of the most recent
>> error. They don’t really add much, do they? I don’t have a strong opinion
>> about this, if you want to keep them.
>>
>
> I'd prefer to keep them threre.
>
>
>>
>> > Index: lib/ProfileData/CoverageMappingReader.cpp
>> > ===================================================================
>> > --- /dev/null
>> > +++ lib/ProfileData/CoverageMappingReader.cpp
>> …
>> > +void CoverageMappingIterator::Increment() {
>> > + if (Reader->readNextRecord(Record))
>> > + *this = CoverageMappingIterator();
>> > +}
>>
>> Please add a comment to explain what this conditional is checking,
>> presumably the end of the records.
>>
>> …
>>
>> > +std::error_code RawCoverageReader::readIntMax(uint64_t &Result,
>> > + uint64_t MaxPlus1) {
>> > + auto Err = readULEB128(Result);
>> > + if (Err)
>> > + return Err;
>>
>> Better to limit the scope of the error code to the conditional by doing:
>> “if (auto Err = …) return Err;”
>>
>> > + if (Result >= MaxPlus1)
>> > + return error(instrprof_error::malformed);
>> > + return success();
>> > +}
>> > +
>> > +std::error_code RawCoverageReader::readSize(uint64_t &Result) {
>> > + auto Err = readULEB128(Result);
>> > + if (Err)
>> > + return Err;
>>
>> Ditto.
>>
>> > + // Sanity check the number
>> > + if (Result > Data.size())
>> > + return error(instrprof_error::malformed);
>> > + return success();
>> > +}
>> > +
>> > +std::error_code RawCoverageReader::readString(StringRef &Result) {
>> > + uint64_t Length;
>> > + auto Err = readSize(Length);
>> > + if (Err)
>> > + return Err;
>>
>> …and again here.
>>
>> > + Result = Data.substr(0, Length);
>> > + Data = Data.substr(Length);
>> > + return success();
>> > +}
>> > +
>> >
>>
>> > +std::error_code RawCoverageFilenamesReader::read() {
>> > + uint64_t IntValue;
>>
>> “IntValue” is a terrible name in this context. How about “NumFilenames”?
>>
>> > + auto Err = readSize(IntValue);
>> > + if (Err)
>> > + return Err;
>> > + for (size_t I = 0, S = IntValue; I < S; ++I) {
>>
>> “S”?? Why not just use “I < NumFilenames” (or whatever you call it)?
>>
>> > + StringRef Filename;
>> > + auto Err = readString(Filename);
>> > + if (Err)
>> > + return Err;
>> > + Filenames.push_back(Filename);
>> > + }
>> > + return success();
>> > +}
>> > +
>> > +std::error_code RawCoverageMappingReader::readCounter(Counter &C) {
>> > + uint64_t IntValue;
>> > + auto Err = readIntMax(IntValue,
>> std::numeric_limits<unsigned>::max());
>> > + if (Err)
>> > + return Err;
>> > + Err = decodeCounter(IntValue, C);
>> > + if (Err)
>> > + return Err;
>>
>> I would define two separate instances of “Err” inside the conditionals
>> here (as above). There are a bunch more cases of this, so I’ll leave it to
>> you to go through and find the rest of them.
>>
>> …
>>
>> > + return success();
>> > +}
>> > +
>> > +static unsigned const EncodingCounterTagAndExpansionRegionTagBits =
>> > + Counter::EncodingTagBits + 1;
>> > +
>>
>> > +std::error_code RawCoverageMappingReader::readMappingRegionsSubArray(
>> > + std::vector<CounterMappingRegion> &MappingRegions, unsigned
>> InferredFileID,
>> > + size_t VFMsize) {
>>
>> Please add a comment before this function to explain its arguments. It
>> isn’t at all obvious what “VFMsize” is (for example).
>>
>> > + uint64_t IntValue;
>>
>> See my comment above about the variable name. In this case, it should be
>> something like “NumRegions”.
>>
>> > + auto Err = readSize(IntValue);
>> > + if (Err)
>> > + return Err;
>> > + unsigned PrevLineStart = 0;
>> > + for (size_t I = 0, S = IntValue; I < S; ++I) {
>>
>> As above: I don’t see any point in introducing a new “S” here.
>>
>> > + Counter C;
>> > + CounterMappingRegion::RegionKind Kind =
>> CounterMappingRegion::CodeRegion;
>> > +
>> > + // Read the combined counter + region kind
>> > + uint64_t IntValue;
>>
>> This shadows the “IntValue” in the outer scope, which is bad. You should
>> definitely rename the outer one, and then maybe it is OK to keep this one
>> with the generic name. If you can come up with a more descriptive name for
>> this instance, that would be even better.
>>
>> > + auto Err = readIntMax(IntValue,
>> std::numeric_limits<unsigned>::max());
>> > + if (Err)
>> > + return Err;
>>
>> > + unsigned Tag = IntValue & Counter::EncodingTagMask;
>> > + uint64_t ExpandedFileID = 0;
>> > + if (Tag != Counter::Zero) {
>> > + if (auto Err = decodeCounter(IntValue, C))
>> > + return Err;
>> > + } else {
>> > + // Is it an expansion region?
>> > + if ((IntValue >> Counter::EncodingTagBits) & 1) {
>>
>> Would it make sense to define a constant instead of using that literal
>> “1”? If so, please do.
>>
>> > + Kind = CounterMappingRegion::ExpansionRegion;
>> > + ExpandedFileID =
>> > + IntValue >> EncodingCounterTagAndExpansionRegionTagBits;
>> > + if (ExpandedFileID >= VFMsize)
>> > + return error(instrprof_error::malformed);
>> > + } else {
>> > + switch (IntValue >>
>> EncodingCounterTagAndExpansionRegionTagBits) {
>> > + case CounterMappingRegion::CodeRegion:
>> > + // Don't do anything when we have a code region with a zero
>> counter.
>> > + break;
>> > + case CounterMappingRegion::EmptyRegion:
>> > + Kind = CounterMappingRegion::EmptyRegion;
>> > + break;
>> > + case CounterMappingRegion::SkippedRegion:
>> > + Kind = CounterMappingRegion::SkippedRegion;
>> > + break;
>> > + default:
>> > + return error(instrprof_error::malformed);
>> > + }
>> > + }
>> > + }
>> > +
>> > + // Read the source range
>> > + uint64_t LocationInfo[4];
>>
>> I think it would be better to define each of these as a scalar variable
>> with a meaningful name, e.g., “DeltaLineStart”. Using an array makes it
>> easier to read the values, but with only 4 of them, it’s really not so bad
>> to just call readIntMax 4 separate times.
>>
>> > + for (unsigned J = 0; J < 4; ++J) {
>> > + Err = readIntMax(LocationInfo[J],
>> std::numeric_limits<unsigned>::max());
>> > + if (Err)
>> > + return Err;
>>
>> > + }
>> > + PrevLineStart += LocationInfo[0];
>> > + // Adjust the column locations for the empty regions that
>> > + // are supposed to cover whole lines.
>>
>> The comment is a good start, but I still don’t quite get it. Part of the
>> problem is that there’s nothing here to indicate what the values in
>> LocationInfo[1] and LocationInfo[3] represent. You shouldn’t have to go to
>> the format document to see that those are “columnStart” and “columnEnd”.
>> Using separate variables with meaningful names will help, but that’s still
>> not good enough. What are these empty regions from? (Maybe an example would
>> help.) Why do you set the column start/end to 1 and uint_max?
>>
>
>> > + if (LocationInfo[1] == 0 && LocationInfo[3] == 0) {
>> > + LocationInfo[1] = 1;
>>
>> Why 1 instead of 0 if this region is supposed to cover a whole line?
>>
>
> That's the usual convention - the column value for the first character in
> a line is 1 (Similar to how the first line is 1, not 0).
>
>
>> > + LocationInfo[3] = std::numeric_limits<unsigned>::max();
>> > + }
>> > + MappingRegions.push_back(CounterMappingRegion(
>> > + C, InferredFileID, PrevLineStart, LocationInfo[1],
>> > + PrevLineStart + LocationInfo[2], LocationInfo[3], Kind));
>> > + MappingRegions.back().ExpandedFileID = ExpandedFileID;
>> > + }
>> > + return success();
>> > +}
>> > +
>> > +std::error_code RawCoverageMappingReader::read(CoverageMappingRecord
>> &Record) {
>> > + uint64_t IntValue;
>> > +
>> > + // Read the virtual file mapping
>> > + llvm::SmallVector<unsigned, 8> VirtualFileMapping;
>> > + auto Err = readSize(IntValue);
>> > + if (Err)
>> > + return Err;
>> > + for (size_t I = 0, S = IntValue; I < S; ++I) {
>>
>> As above, please use a better name than “IntValue” and there’s no need
>> for a separate “S” variable.
>>
>> > + auto Err = readIntMax(IntValue, TranslationUnitFilenames.size());
>>
>> …and this instance of “IntValue” should be a separate “FileIndex” (or
>> similar) variable.
>>
>> > + if (Err)
>> > + return Err;
>> > + VirtualFileMapping.push_back(IntValue);
>> > + }
>> > +
>> > + // Construct the files using unique filenames
>> > + // and virtual file mapping
>> > + for (auto I : VirtualFileMapping) {
>> > + Filenames.push_back(TranslationUnitFilenames[I]);
>> > + }
>> > +
>> > + // Read the expressions
>> > + Err = readSize(IntValue);
>>
>> separate “NumExpressions” variable, please.
>>
>> > + if (Err)
>> > + return Err;
>> > + // Create an array of dummy expressions that get the proper counters
>> > + // when the expressions are read, and the proper kinds when the
>> counters
>> > + // are decoded.
>> > + Expressions.resize(IntValue,
>> CounterExpression(CounterExpression::Subtract,
>> > + Counter(),
>> Counter()));
>> > + for (size_t I = 0, S = IntValue; I < S; ++I) {
>>
>> No need for “S” here.
>>
>> > + Err = readCounter(Expressions[I].LHS);
>> > + if (Err)
>> > + return Err;
>> > + Err = readCounter(Expressions[I].RHS);
>> > + if (Err)
>> > + return Err;
>> > + }
>> > +
>> > + // Read the mapping regions sub-arrays
>> > + for (unsigned InferredFileID = 0, S = VirtualFileMapping.size();
>> > + InferredFileID < S; ++InferredFileID) {
>> > + Err = readMappingRegionsSubArray(MappingRegions, InferredFileID,
>> > + VirtualFileMapping.size());
>> > + if (Err)
>> > + return Err;
>> > + }
>> > +
>> > + // Set the counters for the expansion regions
>>
>> nitpick: There should be a period at the end of that sentence.
>>
>> > + // i.e. Counter of expansion region = counter of the first region
>> > + // from the expanded file.
>> > + // Perform multiple passes to correctly propagate the counters
>> through
>> > + // all the nested expansion regions.
>> > + for (unsigned Pass = 1, S = VirtualFileMapping.size(); Pass < S;
>> ++Pass) {
>> > + for (auto &I : MappingRegions) {
>> > + if (I.Kind == CounterMappingRegion::ExpansionRegion) {
>> > + for (const auto &J : MappingRegions) {
>> > + if (J.FileID == I.ExpandedFileID) {
>> > + I.Count = J.Count;
>> > + break;
>> > + }
>> > + }
>> > + }
>> > + }
>> > + }
>> > +
>> > + Record.FunctionName = FunctionName;
>> > + Record.Filenames = Filenames;
>> > + Record.Expressions = Expressions;
>> > + Record.MappingRegions = MappingRegions;
>> > + return success();
>> > +}
>>
>> …
>>
>> > +template <typename T>
>> > +std::error_code readCoverageMappingData(
>> > + SectionRef &ProfileNames, SectionRef &CoverageMapping,
>> > + SectionRef &CoverageData, SectionRef &CoverageFilenames,
>> > + SectionRef &CoverageTranslationUnits,
>> > + std::vector<ObjectFileCoverageMappingReader::ProfileMappingRecord>
>> &Records,
>> > + std::vector<StringRef> &Filenames) {
>> > + llvm::DenseSet<T> UniqueFunctionMappingData;
>> > +
>> > + // Get the contents of the given sections
>> > + StringRef Data;
>> > + auto Err = CoverageTranslationUnits.getContents(Data);
>> > + if (Err)
>> > + return Err;
>> > + SectionData CoverageMappingData, ProfileNamesData,
>> CoverageFunctionsData,
>> > + CoverageFilenamesData;
>> > + Err = CoverageMappingData.load(CoverageMapping);
>> > + if (Err)
>> > + return Err;
>> > + Err = CoverageFunctionsData.load(CoverageData);
>> > + if (Err)
>> > + return Err;
>>
>> The “CoverageData” name in this code confused me. (It’s all coverage
>> data, right?) It seems like it would be more consistent and understandable
>> to call this “CoverageFunctions”.
>>
>> I’m going to stop there for now. My main question at this point is
>> whether we can make this work with fewer sections. It would be nice if we
>> could merge the function records from the __llvm_cvm_funcs section into the
>> per-function data in the __llvm_prf_cvmap section.
>>
>>
> Yes, I think that it's possible to merge the __llvm_prf_cvmap and the
> __llvm_cvm_funcs sections, if I create a structure for each function that
> contains both the pointer to the function's name and the actual mapping
> data. However, this would probably mean that every function will need a
> unique LLVM structure type - will IR handle that efficiently? The same
> principle could also be applied to the filenames and the translation unit
> sections.
>
It would be even possible to merge all the 4 coverage sections into one,
using the following format:
{ { Translation Unit Record }, [ { Function record } ... ], [ filenames +
coverage mapping for all functions ] }
where:
Translation Unit Record {
uint32 : LengthOfFilenamesAndCoverageMappings
uint32 : Version
}
FunctionRecord {
Pointer : Function name
uint32 : Name length
uint32 : Coverage mapping length
}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140717/3a638095/attachment.html>
More information about the llvm-commits
mailing list