[PATCH] Initial code coverage mapping data, reader, writer + C interface for ProfileData

Thu Jul 17 11:59:02 PDT 2014

2014-07-17 10:17 GMT-07:00 Alex L <arphaman at gmail.com>:

> Thanks!
>
>
> 2014-07-16 22:38 GMT-07:00 Bob Wilson <bob.wilson at apple.com>:
>
> The documentation is going to need more work. Let’s split that out into a
>> separate patch. At least it helps me understand how the reader and writer
>> work.
>>
>> > On Jul 9, 2014, at 4:39 PM, Alex Lorenz <arphaman at gmail.com> wrote:
>> >
>> > The updated version of the patch that fixes various typos, adds format
>> version to the coverage mapping format, changes the storage of filenames to
>> once per tu, and adds document that describes the actual coverage mapping
>> format.
>> >
>> > http://reviews.llvm.org/D4301
>>
>> > Index: include/llvm/ProfileData/CoverageMapping.h
>> > ===================================================================
>> > --- /dev/null
>> > +++ include/llvm/ProfileData/CoverageMapping.h
>> > @@ -0,0 +1,196 @@
>> …
>> > +/// \brief A Counter expression builder is used to construct the
>> > +/// counter expressions. It avoids unecessary duplication
>> > +/// and simplifies algebraic expression.
>> > +class CounterExpressionBuilder {
>>
>> I had commented previously that “expression” should be plural in the
>> comment here. I had not noticed that there were two of them. You changed
>> the first but not the second.
>>
>> …
>>
>> > +/// \brief A Counter mapping region associates a source range with
>> > +/// a specific counter.
>> > +struct CounterMappingRegion {
>> > +  enum RegionKind {
>> > +    /// \brief A CodeRegion associates some code with a counter
>> > +    CodeRegion,
>> > +
>> > +    /// \brief An ExpansionRegion represents
>> > +    /// a file expansion region that associates
>> > +    /// a source range with the expansion of a
>> > +    /// virtual source file, such as for a macro instantiation
>> > +    /// or #include file.
>> > +    ExpansionRegion,
>>
>> My previous review suggested that you reflow comments to 80 columns. For
>> many patches that is about fixing 80 column violations, but in this patch
>> there are a number of comments like this one with very short lines. I would
>> prefer that you take better advantage of the space and fit those on fewer
>> lines.
>>
>> …
>>
>> > Index: include/llvm/ProfileData/CoverageMappingReader.h
>> > ===================================================================
>> > --- /dev/null
>> > +++ include/llvm/ProfileData/CoverageMappingReader.h
>>
>> …
>>
>> > +/// \brief Base class for the raw coverage mapping and filenames data
>> readers.
>> > +class RawCoverageReader {
>> > +protected:
>> > +  StringRef Data;
>> > +
>> > +  /// \brief Return the error code.
>> > +  std::error_code error(std::error_code EC) { return EC; }
>> > +
>> > +  /// \brief Clear the current error code and return a successful one.
>> > +  std::error_code success() { return error(instrprof_error::success); }
>>
>> Unlike the error() and success() functions in InstrProfReader and
>> ObjectFileCoverageMappingReader, these don’t keep track of the most recent
>> error. They don’t really add much, do they? I don’t have a strong opinion
>> about this, if you want to keep them.
>>
>
> I'd prefer to keep them threre.
>
>
>>
>> > Index: lib/ProfileData/CoverageMappingReader.cpp
>> > ===================================================================
>> > --- /dev/null
>> > +++ lib/ProfileData/CoverageMappingReader.cpp
>> …
>> > +void CoverageMappingIterator::Increment() {
>> > +  if (Reader->readNextRecord(Record))
>> > +    *this = CoverageMappingIterator();
>> > +}
>>
>> Please add a comment to explain what this conditional is checking,
>> presumably the end of the records.
>>
>> …
>>
>> > +std::error_code RawCoverageReader::readIntMax(uint64_t &Result,
>> > +                                              uint64_t MaxPlus1) {
>> > +  auto Err = readULEB128(Result);
>> > +  if (Err)
>> > +    return Err;
>>
>> Better to limit the scope of the error code to the conditional by doing:
>> “if (auto Err = …) return Err;”
>>
>> > +  if (Result >= MaxPlus1)
>> > +    return error(instrprof_error::malformed);
>> > +  return success();
>> > +}
>> > +
>> > +std::error_code RawCoverageReader::readSize(uint64_t &Result) {
>> > +  auto Err = readULEB128(Result);
>> > +  if (Err)
>> > +    return Err;
>>
>> Ditto.
>>
>> > +  // Sanity check the number
>> > +  if (Result > Data.size())
>> > +    return error(instrprof_error::malformed);
>> > +  return success();
>> > +}
>> > +
>> > +std::error_code RawCoverageReader::readString(StringRef &Result) {
>> > +  uint64_t Length;
>> > +  auto Err = readSize(Length);
>> > +  if (Err)
>> > +    return Err;
>>
>> …and again here.
>>
>> > +  Result = Data.substr(0, Length);
>> > +  Data = Data.substr(Length);
>> > +  return success();
>> > +}
>> > +
>> >
>>
>> > +std::error_code RawCoverageFilenamesReader::read() {
>> > +  uint64_t IntValue;
>>
>> “IntValue” is a terrible name in this context. How about “NumFilenames”?
>>
>> > +  auto Err = readSize(IntValue);
>> > +  if (Err)
>> > +    return Err;
>> > +  for (size_t I = 0, S = IntValue; I < S; ++I) {
>>
>> “S”?? Why not just use “I < NumFilenames” (or whatever you call it)?
>>
>> > +    StringRef Filename;
>> > +    auto Err = readString(Filename);
>> > +    if (Err)
>> > +      return Err;
>> > +    Filenames.push_back(Filename);
>> > +  }
>> > +  return success();
>> > +}
>> > +
>> > +std::error_code RawCoverageMappingReader::readCounter(Counter &C) {
>> > +  uint64_t IntValue;
>> > +  auto Err = readIntMax(IntValue,
>> std::numeric_limits<unsigned>::max());
>> > +  if (Err)
>> > +    return Err;
>> > +  Err = decodeCounter(IntValue, C);
>> > +  if (Err)
>> > +    return Err;
>>
>> I would define two separate instances of “Err” inside the conditionals
>> here (as above). There are a bunch more cases of this, so I’ll leave it to
>> you to go through and find the rest of them.
>>
>> …
>>
>> > +  return success();
>> > +}
>> > +
>> > +static unsigned const EncodingCounterTagAndExpansionRegionTagBits =
>> > +    Counter::EncodingTagBits + 1;
>> > +
>>
>> > +std::error_code RawCoverageMappingReader::readMappingRegionsSubArray(
>> > +    std::vector<CounterMappingRegion> &MappingRegions, unsigned
>> InferredFileID,
>> > +    size_t VFMsize) {
>>
>> Please add a comment before this function to explain its arguments. It
>> isn’t at all obvious what “VFMsize” is (for example).
>>
>> > +  uint64_t IntValue;
>>
>> See my comment above about the variable name. In this case, it should be
>> something like “NumRegions”.
>>
>> > +  auto Err = readSize(IntValue);
>> > +  if (Err)
>> > +    return Err;
>> > +  unsigned PrevLineStart = 0;
>> > +  for (size_t I = 0, S = IntValue; I < S; ++I) {
>>
>> As above: I don’t see any point in introducing a new “S” here.
>>
>> > +    Counter C;
>> > +    CounterMappingRegion::RegionKind Kind =
>> CounterMappingRegion::CodeRegion;
>> > +
>> > +    // Read the combined counter + region kind
>> > +    uint64_t IntValue;
>>
>> This shadows the “IntValue” in the outer scope, which is bad. You should
>> definitely rename the outer one, and then maybe it is OK to keep this one
>> with the generic name. If you can come up with a more descriptive name for
>> this instance, that would be even better.
>>
>> > +    auto Err = readIntMax(IntValue,
>> std::numeric_limits<unsigned>::max());
>> > +    if (Err)
>> > +      return Err;
>>
>> > +    unsigned Tag = IntValue & Counter::EncodingTagMask;
>> > +    uint64_t ExpandedFileID = 0;
>> > +    if (Tag != Counter::Zero) {
>> > +      if (auto Err = decodeCounter(IntValue, C))
>> > +        return Err;
>> > +    } else {
>> > +      // Is it an expansion region?
>> > +      if ((IntValue >> Counter::EncodingTagBits) & 1) {
>>
>> Would it make sense to define a constant instead of using that literal
>> “1”? If so, please do.
>>
>> > +        Kind = CounterMappingRegion::ExpansionRegion;
>> > +        ExpandedFileID =
>> > +            IntValue >> EncodingCounterTagAndExpansionRegionTagBits;
>> > +        if (ExpandedFileID >= VFMsize)
>> > +          return error(instrprof_error::malformed);
>> > +      } else {
>> > +        switch (IntValue >>
>> EncodingCounterTagAndExpansionRegionTagBits) {
>> > +        case CounterMappingRegion::CodeRegion:
>> > +          // Don't do anything when we have a code region with a zero
>> counter.
>> > +          break;
>> > +        case CounterMappingRegion::EmptyRegion:
>> > +          Kind = CounterMappingRegion::EmptyRegion;
>> > +          break;
>> > +        case CounterMappingRegion::SkippedRegion:
>> > +          Kind = CounterMappingRegion::SkippedRegion;
>> > +          break;
>> > +        default:
>> > +          return error(instrprof_error::malformed);
>> > +        }
>> > +      }
>> > +    }
>> > +
>> > +    // Read the source range
>> > +    uint64_t LocationInfo[4];
>>
>> I think it would be better to define each of these as a scalar variable
>> with a meaningful name, e.g., “DeltaLineStart”. Using an array makes it
>> easier to read the values, but with only 4 of them, it’s really not so bad
>> to just call readIntMax 4 separate times.
>>
>> > +    for (unsigned J = 0; J < 4; ++J) {
>> > +      Err = readIntMax(LocationInfo[J],
>> std::numeric_limits<unsigned>::max());
>> > +      if (Err)
>> > +        return Err;
>>
>> > +    }
>> > +    PrevLineStart += LocationInfo[0];
>> > +    // Adjust the column locations for the empty regions that
>> > +    // are supposed to cover whole lines.
>>
>> The comment is a good start, but I still don’t quite get it. Part of the
>> problem is that there’s nothing here to indicate what the values in
>> LocationInfo[1] and LocationInfo[3] represent. You shouldn’t have to go to
>> the format document to see that those are “columnStart” and “columnEnd”.
>> Using separate variables with meaningful names will help, but that’s still
>> not good enough. What are these empty regions from? (Maybe an example would
>> help.) Why do you set the column start/end to 1 and uint_max?
>>
>
>> > +    if (LocationInfo[1] == 0 && LocationInfo[3] == 0) {
>> > +      LocationInfo[1] = 1;
>>
>> Why 1 instead of 0 if this region is supposed to cover a whole line?
>>
>
> That's the usual convention - the column value for the first character in
> a line is 1 (Similar to how the first line is 1, not 0).
>
>
>> > +      LocationInfo[3] = std::numeric_limits<unsigned>::max();
>> > +    }
>> > +    MappingRegions.push_back(CounterMappingRegion(
>> > +        C, InferredFileID, PrevLineStart, LocationInfo[1],
>> > +        PrevLineStart + LocationInfo[2], LocationInfo[3], Kind));
>> > +    MappingRegions.back().ExpandedFileID = ExpandedFileID;
>> > +  }
>> > +  return success();
>> > +}
>> > +
>> > +std::error_code RawCoverageMappingReader::read(CoverageMappingRecord
>> &Record) {
>> > +  uint64_t IntValue;
>> > +
>> > +  // Read the virtual file mapping
>> > +  llvm::SmallVector<unsigned, 8> VirtualFileMapping;
>> > +  auto Err = readSize(IntValue);
>> > +  if (Err)
>> > +    return Err;
>> > +  for (size_t I = 0, S = IntValue; I < S; ++I) {
>>
>> As above, please use a better name than “IntValue” and there’s no need
>> for a separate “S” variable.
>>
>> > +    auto Err = readIntMax(IntValue, TranslationUnitFilenames.size());
>>
>> …and this instance of “IntValue” should be a separate “FileIndex” (or
>> similar) variable.
>>
>> > +    if (Err)
>> > +      return Err;
>> > +    VirtualFileMapping.push_back(IntValue);
>> > +  }
>> > +
>> > +  // Construct the files using unique filenames
>> > +  // and virtual file mapping
>> > +  for (auto I : VirtualFileMapping) {
>> > +    Filenames.push_back(TranslationUnitFilenames[I]);
>> > +  }
>> > +
>> > +  // Read the expressions
>> > +  Err = readSize(IntValue);
>>
>> separate “NumExpressions” variable, please.
>>
>> > +  if (Err)
>> > +    return Err;
>> > +  // Create an array of dummy expressions that get the proper counters
>> > +  // when the expressions are read, and the proper kinds when the
>> counters
>> > +  // are decoded.
>> > +  Expressions.resize(IntValue,
>> CounterExpression(CounterExpression::Subtract,
>> > +                                                 Counter(),
>> Counter()));
>> > +  for (size_t I = 0, S = IntValue; I < S; ++I) {
>>
>> No need for “S” here.
>>
>> > +    Err = readCounter(Expressions[I].LHS);
>> > +    if (Err)
>> > +      return Err;
>> > +    Err = readCounter(Expressions[I].RHS);
>> > +    if (Err)
>> > +      return Err;
>> > +  }
>> > +
>> > +  // Read the mapping regions sub-arrays
>> > +  for (unsigned InferredFileID = 0, S = VirtualFileMapping.size();
>> > +       InferredFileID < S; ++InferredFileID) {
>> > +    Err = readMappingRegionsSubArray(MappingRegions, InferredFileID,
>> > +                                     VirtualFileMapping.size());
>> > +    if (Err)
>> > +      return Err;
>> > +  }
>> > +
>> > +  // Set the counters for the expansion regions
>>
>> nitpick: There should be a period at the end of that sentence.
>>
>> > +  // i.e. Counter of expansion region = counter of the first region
>> > +  // from the expanded file.
>> > +  // Perform multiple passes to correctly propagate the counters
>> through
>> > +  // all the nested expansion regions.
>> > +  for (unsigned Pass = 1, S = VirtualFileMapping.size(); Pass < S;
>> ++Pass) {
>> > +    for (auto &I : MappingRegions) {
>> > +      if (I.Kind == CounterMappingRegion::ExpansionRegion) {
>> > +        for (const auto &J : MappingRegions) {
>> > +          if (J.FileID == I.ExpandedFileID) {
>> > +            I.Count = J.Count;
>> > +            break;
>> > +          }
>> > +        }
>> > +      }
>> > +    }
>> > +  }
>> > +
>> > +  Record.FunctionName = FunctionName;
>> > +  Record.Filenames = Filenames;
>> > +  Record.Expressions = Expressions;
>> > +  Record.MappingRegions = MappingRegions;
>> > +  return success();
>> > +}
>>
>> …
>>
>> > +template <typename T>
>> > +std::error_code readCoverageMappingData(
>> > +    SectionRef &ProfileNames, SectionRef &CoverageMapping,
>> > +    SectionRef &CoverageData, SectionRef &CoverageFilenames,
>> > +    SectionRef &CoverageTranslationUnits,
>> > +    std::vector<ObjectFileCoverageMappingReader::ProfileMappingRecord>
>> &Records,
>> > +    std::vector<StringRef> &Filenames) {
>> > +  llvm::DenseSet<T> UniqueFunctionMappingData;
>> > +
>> > +  // Get the contents of the given sections
>> > +  StringRef Data;
>> > +  auto Err = CoverageTranslationUnits.getContents(Data);
>> > +  if (Err)
>> > +    return Err;
>> > +  SectionData CoverageMappingData, ProfileNamesData,
>> CoverageFunctionsData,
>> > +      CoverageFilenamesData;
>> > +  Err = CoverageMappingData.load(CoverageMapping);
>> > +  if (Err)
>> > +    return Err;
>> > +  Err = CoverageFunctionsData.load(CoverageData);
>> > +  if (Err)
>> > +    return Err;
>>
>> The “CoverageData” name in this code confused me. (It’s all coverage
>> data, right?) It seems like it would be more consistent and understandable
>> to call this “CoverageFunctions”.
>>
>> I’m going to stop there for now. My main question at this point is
>> whether we can make this work with fewer sections. It would be nice if we
>> could merge the function records from the __llvm_cvm_funcs section into the
>> per-function data in the __llvm_prf_cvmap section.
>>
>>
> Yes, I think that it's possible to merge the __llvm_prf_cvmap and the
> __llvm_cvm_funcs sections, if I create a structure for each function that
> contains both the pointer to the function's name and the actual mapping
> data. However, this would probably mean that every function will need a
> unique LLVM structure type - will IR handle that efficiently? The same
> principle could also be applied to the filenames and the translation unit
> sections.
>

It would be even possible to merge all the 4 coverage sections into one,
using the following format:

 { { Translation Unit Record }, [ { Function record } ... ], [ filenames +
coverage mapping for all functions ] }

where:
   Translation Unit Record {
      uint32 : LengthOfFilenamesAndCoverageMappings
      uint32 : Version
   }

  FunctionRecord {
    Pointer : Function name
    uint32 : Name length
    uint32 : Coverage mapping length
  }
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140717/3a638095/attachment.html>