[PATCH] Initial code coverage mapping data, reader, writer + C interface for ProfileData

Thu Jul 17 10:17:54 PDT 2014

Thanks!

2014-07-16 22:38 GMT-07:00 Bob Wilson <bob.wilson at apple.com>:

> The documentation is going to need more work. Let’s split that out into a
> separate patch. At least it helps me understand how the reader and writer
> work.
>
> > On Jul 9, 2014, at 4:39 PM, Alex Lorenz <arphaman at gmail.com> wrote:
> >
> > The updated version of the patch that fixes various typos, adds format
> version to the coverage mapping format, changes the storage of filenames to
> once per tu, and adds document that describes the actual coverage mapping
> format.
> >
> > http://reviews.llvm.org/D4301
>
> > Index: include/llvm/ProfileData/CoverageMapping.h
> > ===================================================================
> > --- /dev/null
> > +++ include/llvm/ProfileData/CoverageMapping.h
> > @@ -0,0 +1,196 @@
> …
> > +/// \brief A Counter expression builder is used to construct the
> > +/// counter expressions. It avoids unecessary duplication
> > +/// and simplifies algebraic expression.
> > +class CounterExpressionBuilder {
>
> I had commented previously that “expression” should be plural in the
> comment here. I had not noticed that there were two of them. You changed
> the first but not the second.
>
> …
>
> > +/// \brief A Counter mapping region associates a source range with
> > +/// a specific counter.
> > +struct CounterMappingRegion {
> > +  enum RegionKind {
> > +    /// \brief A CodeRegion associates some code with a counter
> > +    CodeRegion,
> > +
> > +    /// \brief An ExpansionRegion represents
> > +    /// a file expansion region that associates
> > +    /// a source range with the expansion of a
> > +    /// virtual source file, such as for a macro instantiation
> > +    /// or #include file.
> > +    ExpansionRegion,
>
> My previous review suggested that you reflow comments to 80 columns. For
> many patches that is about fixing 80 column violations, but in this patch
> there are a number of comments like this one with very short lines. I would
> prefer that you take better advantage of the space and fit those on fewer
> lines.
>
> …
>
> > Index: include/llvm/ProfileData/CoverageMappingReader.h
> > ===================================================================
> > --- /dev/null
> > +++ include/llvm/ProfileData/CoverageMappingReader.h
>
> …
>
> > +/// \brief Base class for the raw coverage mapping and filenames data
> readers.
> > +class RawCoverageReader {
> > +protected:
> > +  StringRef Data;
> > +
> > +  /// \brief Return the error code.
> > +  std::error_code error(std::error_code EC) { return EC; }
> > +
> > +  /// \brief Clear the current error code and return a successful one.
> > +  std::error_code success() { return error(instrprof_error::success); }
>
> Unlike the error() and success() functions in InstrProfReader and
> ObjectFileCoverageMappingReader, these don’t keep track of the most recent
> error. They don’t really add much, do they? I don’t have a strong opinion
> about this, if you want to keep them.
>

I'd prefer to keep them threre.

>
> > Index: lib/ProfileData/CoverageMappingReader.cpp
> > ===================================================================
> > --- /dev/null
> > +++ lib/ProfileData/CoverageMappingReader.cpp
> …
> > +void CoverageMappingIterator::Increment() {
> > +  if (Reader->readNextRecord(Record))
> > +    *this = CoverageMappingIterator();
> > +}
>
> Please add a comment to explain what this conditional is checking,
> presumably the end of the records.
>
> …
>
> > +std::error_code RawCoverageReader::readIntMax(uint64_t &Result,
> > +                                              uint64_t MaxPlus1) {
> > +  auto Err = readULEB128(Result);
> > +  if (Err)
> > +    return Err;
>
> Better to limit the scope of the error code to the conditional by doing:
> “if (auto Err = …) return Err;”
>
> > +  if (Result >= MaxPlus1)
> > +    return error(instrprof_error::malformed);
> > +  return success();
> > +}
> > +
> > +std::error_code RawCoverageReader::readSize(uint64_t &Result) {
> > +  auto Err = readULEB128(Result);
> > +  if (Err)
> > +    return Err;
>
> Ditto.
>
> > +  // Sanity check the number
> > +  if (Result > Data.size())
> > +    return error(instrprof_error::malformed);
> > +  return success();
> > +}
> > +
> > +std::error_code RawCoverageReader::readString(StringRef &Result) {
> > +  uint64_t Length;
> > +  auto Err = readSize(Length);
> > +  if (Err)
> > +    return Err;
>
> …and again here.
>
> > +  Result = Data.substr(0, Length);
> > +  Data = Data.substr(Length);
> > +  return success();
> > +}
> > +
> >
>
> > +std::error_code RawCoverageFilenamesReader::read() {
> > +  uint64_t IntValue;
>
> “IntValue” is a terrible name in this context. How about “NumFilenames”?
>
> > +  auto Err = readSize(IntValue);
> > +  if (Err)
> > +    return Err;
> > +  for (size_t I = 0, S = IntValue; I < S; ++I) {
>
> “S”?? Why not just use “I < NumFilenames” (or whatever you call it)?
>
> > +    StringRef Filename;
> > +    auto Err = readString(Filename);
> > +    if (Err)
> > +      return Err;
> > +    Filenames.push_back(Filename);
> > +  }
> > +  return success();
> > +}
> > +
> > +std::error_code RawCoverageMappingReader::readCounter(Counter &C) {
> > +  uint64_t IntValue;
> > +  auto Err = readIntMax(IntValue, std::numeric_limits<unsigned>::max());
> > +  if (Err)
> > +    return Err;
> > +  Err = decodeCounter(IntValue, C);
> > +  if (Err)
> > +    return Err;
>
> I would define two separate instances of “Err” inside the conditionals
> here (as above). There are a bunch more cases of this, so I’ll leave it to
> you to go through and find the rest of them.
>
> …
>
> > +  return success();
> > +}
> > +
> > +static unsigned const EncodingCounterTagAndExpansionRegionTagBits =
> > +    Counter::EncodingTagBits + 1;
> > +
>
> > +std::error_code RawCoverageMappingReader::readMappingRegionsSubArray(
> > +    std::vector<CounterMappingRegion> &MappingRegions, unsigned
> InferredFileID,
> > +    size_t VFMsize) {
>
> Please add a comment before this function to explain its arguments. It
> isn’t at all obvious what “VFMsize” is (for example).
>
> > +  uint64_t IntValue;
>
> See my comment above about the variable name. In this case, it should be
> something like “NumRegions”.
>
> > +  auto Err = readSize(IntValue);
> > +  if (Err)
> > +    return Err;
> > +  unsigned PrevLineStart = 0;
> > +  for (size_t I = 0, S = IntValue; I < S; ++I) {
>
> As above: I don’t see any point in introducing a new “S” here.
>
> > +    Counter C;
> > +    CounterMappingRegion::RegionKind Kind =
> CounterMappingRegion::CodeRegion;
> > +
> > +    // Read the combined counter + region kind
> > +    uint64_t IntValue;
>
> This shadows the “IntValue” in the outer scope, which is bad. You should
> definitely rename the outer one, and then maybe it is OK to keep this one
> with the generic name. If you can come up with a more descriptive name for
> this instance, that would be even better.
>
> > +    auto Err = readIntMax(IntValue,
> std::numeric_limits<unsigned>::max());
> > +    if (Err)
> > +      return Err;
>
> > +    unsigned Tag = IntValue & Counter::EncodingTagMask;
> > +    uint64_t ExpandedFileID = 0;
> > +    if (Tag != Counter::Zero) {
> > +      if (auto Err = decodeCounter(IntValue, C))
> > +        return Err;
> > +    } else {
> > +      // Is it an expansion region?
> > +      if ((IntValue >> Counter::EncodingTagBits) & 1) {
>
> Would it make sense to define a constant instead of using that literal
> “1”? If so, please do.
>
> > +        Kind = CounterMappingRegion::ExpansionRegion;
> > +        ExpandedFileID =
> > +            IntValue >> EncodingCounterTagAndExpansionRegionTagBits;
> > +        if (ExpandedFileID >= VFMsize)
> > +          return error(instrprof_error::malformed);
> > +      } else {
> > +        switch (IntValue >>
> EncodingCounterTagAndExpansionRegionTagBits) {
> > +        case CounterMappingRegion::CodeRegion:
> > +          // Don't do anything when we have a code region with a zero
> counter.
> > +          break;
> > +        case CounterMappingRegion::EmptyRegion:
> > +          Kind = CounterMappingRegion::EmptyRegion;
> > +          break;
> > +        case CounterMappingRegion::SkippedRegion:
> > +          Kind = CounterMappingRegion::SkippedRegion;
> > +          break;
> > +        default:
> > +          return error(instrprof_error::malformed);
> > +        }
> > +      }
> > +    }
> > +
> > +    // Read the source range
> > +    uint64_t LocationInfo[4];
>
> I think it would be better to define each of these as a scalar variable
> with a meaningful name, e.g., “DeltaLineStart”. Using an array makes it
> easier to read the values, but with only 4 of them, it’s really not so bad
> to just call readIntMax 4 separate times.
>
> > +    for (unsigned J = 0; J < 4; ++J) {
> > +      Err = readIntMax(LocationInfo[J],
> std::numeric_limits<unsigned>::max());
> > +      if (Err)
> > +        return Err;
>
> > +    }
> > +    PrevLineStart += LocationInfo[0];
> > +    // Adjust the column locations for the empty regions that
> > +    // are supposed to cover whole lines.
>
> The comment is a good start, but I still don’t quite get it. Part of the
> problem is that there’s nothing here to indicate what the values in
> LocationInfo[1] and LocationInfo[3] represent. You shouldn’t have to go to
> the format document to see that those are “columnStart” and “columnEnd”.
> Using separate variables with meaningful names will help, but that’s still
> not good enough. What are these empty regions from? (Maybe an example would
> help.) Why do you set the column start/end to 1 and uint_max?
>

> > +    if (LocationInfo[1] == 0 && LocationInfo[3] == 0) {
> > +      LocationInfo[1] = 1;
>
> Why 1 instead of 0 if this region is supposed to cover a whole line?
>

That's the usual convention - the column value for the first character in a
line is 1 (Similar to how the first line is 1, not 0).

> > +      LocationInfo[3] = std::numeric_limits<unsigned>::max();
> > +    }
> > +    MappingRegions.push_back(CounterMappingRegion(
> > +        C, InferredFileID, PrevLineStart, LocationInfo[1],
> > +        PrevLineStart + LocationInfo[2], LocationInfo[3], Kind));
> > +    MappingRegions.back().ExpandedFileID = ExpandedFileID;
> > +  }
> > +  return success();
> > +}
> > +
> > +std::error_code RawCoverageMappingReader::read(CoverageMappingRecord
> &Record) {
> > +  uint64_t IntValue;
> > +
> > +  // Read the virtual file mapping
> > +  llvm::SmallVector<unsigned, 8> VirtualFileMapping;
> > +  auto Err = readSize(IntValue);
> > +  if (Err)
> > +    return Err;
> > +  for (size_t I = 0, S = IntValue; I < S; ++I) {
>
> As above, please use a better name than “IntValue” and there’s no need for
> a separate “S” variable.
>
> > +    auto Err = readIntMax(IntValue, TranslationUnitFilenames.size());
>
> …and this instance of “IntValue” should be a separate “FileIndex” (or
> similar) variable.
>
> > +    if (Err)
> > +      return Err;
> > +    VirtualFileMapping.push_back(IntValue);
> > +  }
> > +
> > +  // Construct the files using unique filenames
> > +  // and virtual file mapping
> > +  for (auto I : VirtualFileMapping) {
> > +    Filenames.push_back(TranslationUnitFilenames[I]);
> > +  }
> > +
> > +  // Read the expressions
> > +  Err = readSize(IntValue);
>
> separate “NumExpressions” variable, please.
>
> > +  if (Err)
> > +    return Err;
> > +  // Create an array of dummy expressions that get the proper counters
> > +  // when the expressions are read, and the proper kinds when the
> counters
> > +  // are decoded.
> > +  Expressions.resize(IntValue,
> CounterExpression(CounterExpression::Subtract,
> > +                                                 Counter(), Counter()));
> > +  for (size_t I = 0, S = IntValue; I < S; ++I) {
>
> No need for “S” here.
>
> > +    Err = readCounter(Expressions[I].LHS);
> > +    if (Err)
> > +      return Err;
> > +    Err = readCounter(Expressions[I].RHS);
> > +    if (Err)
> > +      return Err;
> > +  }
> > +
> > +  // Read the mapping regions sub-arrays
> > +  for (unsigned InferredFileID = 0, S = VirtualFileMapping.size();
> > +       InferredFileID < S; ++InferredFileID) {
> > +    Err = readMappingRegionsSubArray(MappingRegions, InferredFileID,
> > +                                     VirtualFileMapping.size());
> > +    if (Err)
> > +      return Err;
> > +  }
> > +
> > +  // Set the counters for the expansion regions
>
> nitpick: There should be a period at the end of that sentence.
>
> > +  // i.e. Counter of expansion region = counter of the first region
> > +  // from the expanded file.
> > +  // Perform multiple passes to correctly propagate the counters through
> > +  // all the nested expansion regions.
> > +  for (unsigned Pass = 1, S = VirtualFileMapping.size(); Pass < S;
> ++Pass) {
> > +    for (auto &I : MappingRegions) {
> > +      if (I.Kind == CounterMappingRegion::ExpansionRegion) {
> > +        for (const auto &J : MappingRegions) {
> > +          if (J.FileID == I.ExpandedFileID) {
> > +            I.Count = J.Count;
> > +            break;
> > +          }
> > +        }
> > +      }
> > +    }
> > +  }
> > +
> > +  Record.FunctionName = FunctionName;
> > +  Record.Filenames = Filenames;
> > +  Record.Expressions = Expressions;
> > +  Record.MappingRegions = MappingRegions;
> > +  return success();
> > +}
>
> …
>
> > +template <typename T>
> > +std::error_code readCoverageMappingData(
> > +    SectionRef &ProfileNames, SectionRef &CoverageMapping,
> > +    SectionRef &CoverageData, SectionRef &CoverageFilenames,
> > +    SectionRef &CoverageTranslationUnits,
> > +    std::vector<ObjectFileCoverageMappingReader::ProfileMappingRecord>
> &Records,
> > +    std::vector<StringRef> &Filenames) {
> > +  llvm::DenseSet<T> UniqueFunctionMappingData;
> > +
> > +  // Get the contents of the given sections
> > +  StringRef Data;
> > +  auto Err = CoverageTranslationUnits.getContents(Data);
> > +  if (Err)
> > +    return Err;
> > +  SectionData CoverageMappingData, ProfileNamesData,
> CoverageFunctionsData,
> > +      CoverageFilenamesData;
> > +  Err = CoverageMappingData.load(CoverageMapping);
> > +  if (Err)
> > +    return Err;
> > +  Err = CoverageFunctionsData.load(CoverageData);
> > +  if (Err)
> > +    return Err;
>
> The “CoverageData” name in this code confused me. (It’s all coverage data,
> right?) It seems like it would be more consistent and understandable to
> call this “CoverageFunctions”.
>
> I’m going to stop there for now. My main question at this point is whether
> we can make this work with fewer sections. It would be nice if we could
> merge the function records from the __llvm_cvm_funcs section into the
> per-function data in the __llvm_prf_cvmap section.
>
>
Yes, I think that it's possible to merge the __llvm_prf_cvmap and the
__llvm_cvm_funcs sections, if I create a structure for each function that
contains both the pointer to the function's name and the actual mapping
data. However, this would probably mean that every function will need a
unique LLVM structure type - will IR handle that efficiently? The same
principle could also be applied to the filenames and the translation unit
sections.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140717/4546022c/attachment.html>