[PATCH] D10674: Value profiling - patchset 3

Thu Aug 27 10:59:41 PDT 2015

>>>>      DataBuffer.push_back(InstrProfRecord(K, Hash,
>>> std::move(CounterBuffer)));
>>>> +
>>>> +    // Read value profiling data
>>>> +    for (uint32_t Kind = instrprof_value_kind::first;
>>>> +         Kind < instrprof_value_kind::size; ++Kind) {
>>
>>> I still think we should store the kind in the data here and then read
>> the following data based on that. There's no guarantee that all kinds
>> that
>> aren't indirect_call_target will behave as they do in this loop as is,
>> so
>> I'd rather just reject outright kinds we don't understand. If the type
>> is
>> in the data we can do that, and it also saves a few bytes for functions
>> that don't have any value profiling data (and more bytes when we support
>> more kinds of value profiling).
>>
>> I do not think this is possible. The format indicates that multiple hash
>> values and data may be stored per name entry. Given that there is no
>> indicator in the format that tells whether the next bytes after counter
>> values would belong to a value kind or a hash value for the next data
>> for
>> the same key.
>>
>
> Sorry I missed this one. This seems to be the last main thing to be
> addressed/resolved/agreed upon for the indexed format.
>
> I think Justin's suggestion is possible. We just need one mandatory
> field to be emitted before the value profile data.
> value_profile_section:
>           .quad  NUM_VALUE_KIND
>
> This field indicates how many value profile kinds have profile data.
> When value profile data are missing, this format has storage
> advantage. For instance, when the function has no value profile data,
> The value profile section is just
>
> value_profile_section:
>           .quad 0
>
> With Betuls patch, it will be (assuming 2 kinds)
>
> value_profile_section:
>          .quad 0           // num of sites for kind 1 is 0
>          .quad 0           // num of sites for kind 2 is 0
>
> However if value profile data exists for all value kinds, Betul's
> suggested format is slightly better -- it does not need to store the
> mandatory NUM_VALUE_KIND field, nor the value kind field it self.
>
> Since I believe that value profile data will be sparse (especially
> when we add more kinds in the future), my vote is on Justin's format
> which is more flexible.

I'll make sure to address this item as well. I should respond in a day or so.

Thanks,
-Betul

>
> David
>
>
>
>>> This also has the benefit that the V2 data will be valid for V3, so we
>> might even be able to get away without adding branches to check the
>> version difference here, which would be a convenient simplification. I
>> haven't thought through if that actually works though.
>>
>> ReadData can still read V2, therefore I do not understand why it should
>> not be valid.
>>
>>>> +      // Read number of value sites.
>>>> +      if (D + sizeof(uint32_t) > End)
>>>> +        return data_type();
>>>> +      uint32_t ValueSiteCount = endian::readNext<uint32_t, little,
>>> unaligned>(D);
>>>> +
>>>> +      for (uint32_t VSite = 0; VSite < ValueSiteCount; ++VSite) { +
>>     // Read number of value data pairs at value site.
>>>> +        if (D + sizeof(uint32_t) > End)
>>>> +          return data_type();
>>>> +        uint32_t ValueDataCount = endian::readNext<uint32_t, little,
>>> unaligned>(D);
>>>> +
>>>> +        InstrProfValueSiteRecord VSiteRecord;
>>>> +        for (uint32_t VCount = 0; VCount < ValueDataCount; ++VCount)
>>>> {
>> +          if (D + 2 * sizeof(uint64_t) > End)
>>>> +            return data_type();
>>>> +          uint64_t Value = endian::readNext<uint64_t, little,
>>> unaligned>(D);
>>>> +          if (Kind == instrprof_value_kind::indirect_call_target) { +
>>           auto Result = HashKeyMap.find(Value);
>>>> +            if (Result != HashKeyMap.end())
>>>> +              Value = (uint64_t)Result->second;
>>
>>> What if it isn't found? As I mentioned earlier, I'm not very
>>> comfortable
>> with this approach to storing the strings.
>>
>> I'm not storing additional string in the file. I'm reading the existing
>> set of keys, add them to the string table and create a map from hash
>> values to const char*'s to string table entries.
>>
>>>> +          }
>>>> +          uint64_t NumTaken = endian::readNext<uint64_t, little,
>>> unaligned>(D);
>>>> +          VSiteRecord.ValueData.push_back(std::make_pair(Value,
>>> NumTaken));
>>>> +        }
>>>> +
>>> DataBuffer[DataBuffer.size()-1].ValueSites[Kind].push_back(VSiteRecord);
>> This is better written as DataBuffer.back(). It also might make sense to
>> sort these here rather than in mergeValueData - more on that later.
>>
>> Done.
>>
>>>> +      }
>>>> +    }
>>>>    }
>>>>    return DataBuffer;
>>>>  }
>>>> @@ -379,19 +444,24 @@
>>>>    if (HashType > IndexedInstrProf::HashT::Last)
>>>>      return error(instrprof_error::unsupported_hash_type);
>>>>    uint64_t HashOffset = endian::readNext<uint64_t, little,
>>> unaligned>(Cur);
>>>> -
>>>>    // The rest of the file is an on disk hash table.
>>>>    Index.reset(InstrProfReaderIndex::Create(
>>>>        Start + HashOffset, Cur, Start,
>>>> -      InstrProfLookupTrait(HashType, FormatVersion)));
>>>> +      InstrProfLookupTrait(HashType, FormatVersion, HashKeyMap))); +
>> for (auto Key : Index->keys()) {
>>>> +    const char* KeyTableRef = StringTable.insertString(Key);
>>>> +    HashKeyMap.insert(std::make_pair(ComputeHash(HashType, Key),
>>> KeyTableRef));
>>>> +  }
>>>>    // Set up our iterator for readNextRecord.
>>>>    RecordIterator = Index->data_begin();
>>>>    return success();
>>>>  }
>>>>  std::error_code IndexedInstrProfReader::getFunctionCounts(
>>>> -    StringRef FuncName, uint64_t FuncHash, std::vector<uint64_t>
>>> &Counts) {
>>>> +  StringRef FuncName, uint64_t FuncHash,
>>>> +  std::vector<uint64_t> &Counts) {
>>>> +
>>>>    auto Iter = Index->find(FuncName);
>>>>    if (Iter == Index->end())
>>>>      return error(instrprof_error::unknown_function);
>>>> @@ -411,6 +481,29 @@
>>>>    return error(instrprof_error::hash_mismatch);
>>>>  }
>>>> +std::error_code IndexedInstrProfReader::getFunctionValuesForKind( +
>> StringRef FuncName, uint64_t FuncHash, uint32_t ValueKind,
>>>> +  std::vector<InstrProfValueSiteRecord> &Values) {
>>
>>> Does this function work? Won't it return an index into a string table
>> with no means to turn that into the actual string for indirect call
>> profiling?
>>
>> ValueData for indirect call targets store not indices but char*'s to
>> string table entries. The function works for that reason and I've
>> locally
>> verified that on my end. However, I've removed this function since it's
>> not in use at this time.
>>
>>>> +
>>>> +  auto Iter = Index->find(FuncName);
>>>> +  if (Iter == Index->end())
>>>> +    return error(instrprof_error::unknown_function);
>>>> +
>>>> +  // Found it. Look for counters with the right hash.
>>>> +  ArrayRef<InstrProfRecord> Data = (*Iter);
>>>> +  if (Data.empty())
>>>> +    return error(instrprof_error::malformed);
>>>> +
>>>> +  for (unsigned I = 0, E = Data.size(); I < E; ++I) {
>>>> +    // Check for a match and fill the vector if there is one.
>>>> +    if (Data[I].Hash == FuncHash) {
>>>> +      Values = Data[I].ValueSites[ValueKind];
>>>> +      return success();
>>>> +    }
>>>> +  }
>>>> +  return error(instrprof_error::hash_mismatch);
>>>> +}
>>>> +
>>>>  std::error_code
>>>>  IndexedInstrProfReader::readNextRecord(InstrProfRecord &Record) {
>>>>    // Are we out of records?
>>>> Index: lib/ProfileData/InstrProfIndexed.h
>>>> ===================================================================
>>>> ---
>> lib/ProfileData/InstrProfIndexed.h
>>>> +++ lib/ProfileData/InstrProfIndexed.h
>>>> @@ -47,7 +47,7 @@
>>>>  }
>>>>  const uint64_t Magic = 0x8169666f72706cff; // "\xfflprofi\x81"
>>>> -const uint64_t Version = 2;
>>>> +const uint64_t Version = 3;
>>>>  const HashT HashType = HashT::MD5;
>>>>  }
>>>> Index: lib/ProfileData/InstrProf.cpp
>>>> ===================================================================
>>>> ---
>> lib/ProfileData/InstrProf.cpp
>>>> +++ lib/ProfileData/InstrProf.cpp
>>>> @@ -50,6 +50,8 @@
>>>>        return "Function count mismatch";
>>>>      case instrprof_error::counter_overflow:
>>>>        return "Counter overflow";
>>>> +    case instrprof_error::value_site_count_mismatch:
>>>> +      return "Function's value site counts mismatch";
>>>>      }
>>>>      llvm_unreachable("A value of instrprof_error has no message.");
>>>>    }
>>>> Index: include/llvm/ProfileData/InstrProfWriter.h
>>>> ===================================================================
>>>> ---
>> include/llvm/ProfileData/InstrProfWriter.h
>>>> +++ include/llvm/ProfileData/InstrProfWriter.h
>>>> @@ -15,33 +15,30 @@
>>>>  #ifndef LLVM_PROFILEDATA_INSTRPROFWRITER_H
>>>>  #define LLVM_PROFILEDATA_INSTRPROFWRITER_H
>>>> -#include "llvm/ADT/ArrayRef.h"
>>>>  #include "llvm/ADT/DenseMap.h"
>>>>  #include "llvm/ADT/StringMap.h"
>>>>  #include "llvm/ProfileData/InstrProf.h"
>>>>  #include "llvm/Support/DataTypes.h"
>>>>  #include "llvm/Support/MemoryBuffer.h"
>>>>  #include "llvm/Support/raw_ostream.h"
>>>> -#include <vector>
>>>>  namespace llvm {
>>>>  /// Writer for instrumentation based profile data.
>>>>  class InstrProfWriter {
>>>>  public:
>>>> -  typedef SmallDenseMap<uint64_t, std::vector<uint64_t>, 1>
>>> CounterData;
>>>> +  typedef SmallDenseMap<uint64_t, InstrProfRecord, 1> ProfilingData;
>>>>  private:
>>>> -  StringMap<CounterData> FunctionData;
>>>> +  InstrProfStringTable StringTable;
>>>> +  StringMap<ProfilingData> FunctionData;
>>>>    uint64_t MaxFunctionCount;
>>>>  public:
>>>>    InstrProfWriter() : MaxFunctionCount(0) {}
>>>>    /// Add function counts for the given function. If there are
>>>> already
>>> counts
>>>>    /// for this function and the hash and number of counts match, each
>>> counter is
>>>>    /// summed.
>>>> -  std::error_code addFunctionCounts(StringRef FunctionName,
>>>> -                                    uint64_t FunctionHash,
>>>> -                                    ArrayRef<uint64_t> Counters); +
>> std::error_code addRecord(InstrProfRecord &I);
>>
>>> I'm not a fan of dropping const here, but I think it can come back if
>>> we
>> move the updateStringTable method anyway.
>>
>> Same here. But, I couldn't find a way to get it to work otherwise.
>>
>>>>    /// Write the profile to \c OS
>>>>    void write(raw_fd_ostream &OS);
>>>>    /// Write the profile, returning the raw data. For testing.
>>>> Index: include/llvm/ProfileData/InstrProfReader.h
>>>> ===================================================================
>>>> ---
>> include/llvm/ProfileData/InstrProfReader.h
>>>> +++ include/llvm/ProfileData/InstrProfReader.h
>>>> @@ -24,6 +24,7 @@
>>>>  #include "llvm/Support/MemoryBuffer.h"
>>>>  #include "llvm/Support/OnDiskHashTable.h"
>>>>  #include <iterator>
>>>> +#include <map>
>>>>  namespace llvm {
>>>> @@ -145,7 +146,6 @@
>>>>      const uint64_t CountersDelta;
>>>>      const uint64_t NamesDelta;
>>>>    };
>>>> -
>>>>    bool ShouldSwapBytes;
>>>>    uint64_t CountersDelta;
>>>>    uint64_t NamesDelta;
>>>> @@ -195,10 +195,13 @@
>>>>    std::vector<InstrProfRecord> DataBuffer;
>>>>    IndexedInstrProf::HashT HashType;
>>>>    unsigned FormatVersion;
>>>> +  const std::map<uint64_t, const char*> &HashKeyMap;
>>>>  public:
>>>> -  InstrProfLookupTrait(IndexedInstrProf::HashT HashType, unsigned
>>> FormatVersion)
>>>> -      : HashType(HashType), FormatVersion(FormatVersion) {}
>>>> +  InstrProfLookupTrait(IndexedInstrProf::HashT HashType,
>>>> +      unsigned FormatVersion, std::map<uint64_t, const char*>
>>> &HashKeyMap)
>>>> +      : HashType(HashType), FormatVersion(FormatVersion),
>>>> +      HashKeyMap(HashKeyMap) {}
>>>>    typedef ArrayRef<InstrProfRecord> data_type;
>>>> @@ -209,6 +212,7 @@
>>>>    static bool EqualKey(StringRef A, StringRef B) { return A == B; }
>> static StringRef GetInternalKey(StringRef K) { return K; }
>>>> +  static StringRef GetExternalKey(StringRef K) { return K; }
>>>>    hash_value_type ComputeHash(StringRef K);
>>>> @@ -224,7 +228,17 @@
>>>>      return StringRef((const char *)D, N);
>>>>    }
>>>> -  data_type ReadData(StringRef K, const unsigned char *D, offset_type
>>> N);
>>>> +  data_type ReadDataV1V2(StringRef K, const unsigned char *D,
>>> offset_type N);
>>>> +  data_type ReadDataV3(StringRef K, const unsigned char *D,
>> offset_type
>>> N);
>>>> +
>>>> +  data_type ReadData(StringRef K, const unsigned char *D, offset_type
>>> N) {
>>>> +    switch (FormatVersion) {
>>>> +      case 1:
>>>> +      case 2: return ReadDataV1V2(K, D, N);
>>>> +      case 3: return ReadDataV3(K, D, N);
>>>> +    }
>>>> +    return data_type();
>>>> +  }
>>>>  };
>>>>  typedef OnDiskIterableChainedHashTable<InstrProfLookupTrait>
>>>> @@ -243,6 +257,10 @@
>>>>    uint64_t FormatVersion;
>>>>    /// The maximal execution count among all functions.
>>>>    uint64_t MaxFunctionCount;
>>>> +  /// String table.
>>>> +  InstrProfStringTable StringTable;
>>>> +  /// Map of hash values to const char* keys in profiling data. +
>> std::map<uint64_t, const char*> HashKeyMap;
>>>>    IndexedInstrProfReader(const IndexedInstrProfReader &) = delete;
>> IndexedInstrProfReader &operator=(const IndexedInstrProfReader &) =
>>> delete;
>>>> @@ -261,6 +279,13 @@
>>>>    /// Fill Counts with the profile data for the given function name.
>> std::error_code getFunctionCounts(StringRef FuncName, uint64_t
>>> FuncHash,
>>>>                                      std::vector<uint64_t> &Counts);
>>>> +
>>>> +  /// Return value profile data for the given function name and hash
>>> and
>>>> +  /// value profiling kind
>>>> +  std::error_code getFunctionValuesForKind(StringRef FuncName, +
>> uint64_t FuncHash, uint32_t ValueKind,
>>>> +      std::vector<InstrProfValueSiteRecord> &Values);
>>>> +
>>>>    /// Return the maximum of all known function counts.
>>>>    uint64_t getMaximumFunctionCount() { return MaxFunctionCount; }
>>>> Index: include/llvm/ProfileData/InstrProf.h
>>>> ===================================================================
>>>> ---
>> include/llvm/ProfileData/InstrProf.h
>>>> +++ include/llvm/ProfileData/InstrProf.h
>>>> @@ -16,10 +16,11 @@
>>>>  #ifndef LLVM_PROFILEDATA_INSTRPROF_H_
>>>>  #define LLVM_PROFILEDATA_INSTRPROF_H_
>>>> -#include "llvm/ADT/StringRef.h"
>>>> +#include "llvm/ADT/StringSet.h"
>>>>  #include <cstdint>
>>>> -#include <system_error>
>>>> +#include <list>
>>>>  #include <vector>
>>>> +#include <system_error>
>>>>  namespace llvm {
>>>>  const std::error_category &instrprof_category();
>>>> @@ -37,21 +38,99 @@
>>>>      unknown_function,
>>>>      hash_mismatch,
>>>>      count_mismatch,
>>>> -    counter_overflow
>>>> +    counter_overflow,
>>>> +    value_site_count_mismatch
>>>>  };
>>>>  inline std::error_code make_error_code(instrprof_error E) {
>>>>    return std::error_code(static_cast<int>(E), instrprof_category());
>>>>  }
>>>> +enum instrprof_value_kind : uint32_t {
>>>> +    first = 0,
>>>> +    indirect_call_target = 0,
>>>> +    size = 1
>>>> +};
>>>> +
>>>> +struct InstrProfStringTable {
>>>> +  // Set of string values in profiling data.
>>>> +  StringSet<> StringValueSet;
>>>> +  InstrProfStringTable() { StringValueSet.clear(); }
>>>> +  // Get a pointer to internal storage of a string in set
>>>> +  const char* getStringData(StringRef Str) {
>>>> +    auto Result = StringValueSet.find(Str);
>>>> +    return (Result == StringValueSet.end()) ? nullptr :
>>> Result->first().data();
>>>> +  }
>>>> +  // Insert a string to StringTable
>>>> +  const char* insertString(StringRef Str) {
>>>> +    auto Result = StringValueSet.insert(Str);
>>>> +    return Result.first->first().data();
>>>> +  }
>>>> +};
>>> Is this type giving us any value over just using the StringSet<>
>> directly?
>>
>> This is returning const char*'s which is what's stored in the value data
>> entries for indirect call target data. So I found wrapping it like this
>> very useful.
>>
>>>> +
>>>> +struct InstrProfValueSiteRecord {
>>>> +  // Typedef for a single TargetValue-NumTaken pair.
>>>> +  typedef std::pair<uint64_t, uint64_t> ValueDataPair;
>>>> +  // Value profiling data pairs at a given value site.
>>>> +  std::list<ValueDataPair> ValueData;
>>>> +
>>>> +  InstrProfValueSiteRecord() { ValueData.clear(); }
>>>> +  InstrProfValueSiteRecord(const InstrProfValueSiteRecord &Rec) +
>> : ValueData(std::move(Rec.ValueData)) {}
>>>> +
>>>> +  // Sort ValueData ascending by TargetValue
>>>> +  void sortByTargetValues() {
>>>> +    ValueData.sort([](const ValueDataPair &left, const ValueDataPair
>>> &right)
>>>> +                { return left.first < right.first; });
>>> clang-format and follow LLVM convention for variable names please.
>>
>> Done.
>>
>>>> +  }
>>>> +  // Sort ValueData descending by NumTaken
>>>> +  void sortByNumTaken() {
>>>> +    ValueData.sort([](const ValueDataPair &left, const ValueDataPair
>>> &right)
>>>> +                { return left.second > right.second; });
>>>> +  }
>>> This seems to be unused.
>>
>> Removed and will be re-introduced w/ the rest of the patches.
>>
>>>> +  // Merge data from another InstrProfValueSiteRecord
>>>> +  void mergeValueData(InstrProfValueSiteRecord &Input) {
>>>> +    this->sortByTargetValues();
>>>> +    Input.sortByTargetValues();
>>> It seems wasteful to re-sort these all the time. Would it make sense to
>> sort at read time, or possibly even canonicalize the on-disk order to
>> already be sorted?
>>
>> I left it as is at this time. Once the API is provided one can not
>> guarantee the future uses of it. So using the sorting here guarantees
>> that
>> future possible usages of the API's  will not break this merge
>> implementation.
>>
>>> I guess it might make sense for the on-disk order to be sorted by
>> num-taken if that's what the frontend will want, and since merging is
>> less
>> common the sort cost is okay. In any case, we should probably set it up
>> so
>> that the on disk format is always sorted in a particular way - that will
>> remove some non-determinism and we can use it to speed up whichever
>> operation we think is more important.
>>
>> I'll make sure the on disk format is always sorted.
>>
>>>> +    auto I = ValueData.begin();
>>>> +    auto J = Input.ValueData.begin();
>>>> +    while (J != Input.ValueData.end()) {
>>>> +      while (I != ValueData.end() && I->first < J->first)
>>>> +        ++I;
>>>> +      if (I != ValueData.end() && I->first == J->first) {
>>>> +        I->second += J->second;
>>>> +        ++I;
>>>> +        ++J;
>>>> +        continue;
>>>> +      }
>>>> +      ValueData.insert(I, *J);
>>>> +      ++J;
>>>> +    }
>>> I suspect a for-loop would read better here. Also please store the
>> ".end()" values in variables rather than re-evaluating.
>>
>> Done.
>>
>>>> +  }
>>>> +};
>>>> +
>>>>  /// Profiling information for a single function.
>>>>  struct InstrProfRecord {
>>>>    InstrProfRecord() {}
>>>>    InstrProfRecord(StringRef Name, uint64_t Hash,
>>>> std::vector<uint64_t>
>>> Counts)
>>>>        : Name(Name), Hash(Hash), Counts(std::move(Counts)) {}
>>>>    StringRef Name;
>>>>    uint64_t Hash;
>>>>    std::vector<uint64_t> Counts;
>>>> +  // Size of vector indicates the number of value sites for a value
>>> kind
>>>> +  std::vector<InstrProfValueSiteRecord>
>>> ValueSites[instrprof_value_kind::size];
>>> I don't think we're gaining much by having this be an array - I was
>> thinking it would be more like
>>>   std::vector<InstrProfValueSiteRecord> IndirectCalls;
>>> Then when we add more value types, they can have their own variables
>>> and
>> be accessed directly. Most of the code that works with these will have a
>> particular kind in mind, and since the value data is dependent on kind
>> looping over these isn't generally that useful. That is, the looping we
>> have now is only in the reader and writer, and I can't see the users of
>> the data ever doing that.
>>
>> I think, I'm leaning towards keeping an array of kinds here.
>>
>>> For the reader and writer, a switch statement over the kinds will allow
>> us to warn if someone doesn't update somewhere when they add a new kind.
>> For the users of profile data, Data->IndirectCalls reads a lot better
>> than
>> Data->ValueSites[instrprof_value_kind::indirect_call_target].
>>
>> I'll revisit my latest patch to allow for that.
>>
>>> It really seems like the right trade off to me.
>>>> +  // Clear value data entries
>>>> +  void clearValueData() {
>>>> +    for (uint32_t Kind = instrprof_value_kind::first;
>>>> +         Kind < instrprof_value_kind::size; ++Kind)
>>>> +      ValueSites[Kind].clear();
>>>> +  }
>>
>>> This is never called, it can be removed.
>>
>> Done.
>>
>>>> +  void updateStringTableReferences(InstrProfStringTable &StrTable) {
>>>> +
>>    Name = StrTable.insertString(Name);
>>>> +    for (auto VSite :
>>> ValueSites[instrprof_value_kind::indirect_call_target])
>>>> +      for (auto VData : VSite.ValueData)
>>>> +        VData.first = (uint64_t)StrTable.insertString((const char
>>> *)VData.first);
>>>> +  }
>>>>  };
>>>>  } // end namespace llvm
>>
>>
>>
>>
>>
>>
>