[PATCH] D10674: Value profiling - patchset 3

Wed Aug 19 11:40:28 PDT 2015

I tried one of our largest programs with over 3 million function
symbols -- the result is the same -- no collisions are found using md5
hash.

David

On Mon, Aug 17, 2015 at 10:39 PM, Xinliang David Li <davidxl at google.com> wrote:
> Hi Justin, I have some data regarding the hash collision. Two binaries
> are checked in the experiment: one is 'clang' and the other is
> xalanbmk. For clang, there are about 63K unique function symbols and
> xalan has 16K unique function symbols.  Using MD5 (the lower 64bit)
> produces 'perfect' hash for the two binaries -- there are no
> collisions detected at all for either of them.  Neither are collisions
>  found for the combined set of symbols of the two binaries. In short,
> for value profiling purposes, hash collision is probably not something
> to be worried about in practice. In other words, the string table does
> not need to be emitted. Even if it ever becomes a problem, there are
> probably cheaper ways to deal with it.  Related, to greatly shrink
> profile data size, function name's md5sum should be used as the
> profile data's key instead of using the raw name. A function can be
> uniquely identified with that key plus the CFG hash.
>
> thanks,
>
> David
>
>
> On Mon, Aug 17, 2015 at 4:44 PM, Justin Bogner <mail at justinbogner.com> wrote:
>> Betul Buyukkurt <betulb at codeaurora.org> writes:
>>> betulb updated this revision to Diff 32113.
>>> betulb added a comment.
>>>
>>> - Reserved fields are removed from the instrprof_value_kind enum.
>>> - All strings are stored in the string table (InstrProfStringTable).
>>> - Review comments on file format are addressed the suggested indexed
>>> file format is implemented. The only difference is from the suggested
>>> implementation is that value kind is not written out explicitly in the
>>> profile data file.
>>> - Data structures are arranged such that the old value data is
>>> arranged into site records, and the InstrProfRecord is keeping a
>>> vector of site records.
>>
>> Thanks for working on this. Comments inline.
>>
>>>
>>> http://reviews.llvm.org/D10674
>>>
>>> Files:
>>>   include/llvm/ProfileData/InstrProf.h
>>>   include/llvm/ProfileData/InstrProfReader.h
>>>   include/llvm/ProfileData/InstrProfWriter.h
>>>   lib/ProfileData/InstrProf.cpp
>>>   lib/ProfileData/InstrProfIndexed.h
>>>   lib/ProfileData/InstrProfReader.cpp
>>>   lib/ProfileData/InstrProfWriter.cpp
>>>   tools/llvm-profdata/llvm-profdata.cpp
>>>   unittests/ProfileData/CoverageMappingTest.cpp
>>>   unittests/ProfileData/InstrProfTest.cpp
>>>
>>> Index: unittests/ProfileData/InstrProfTest.cpp
>>> ===================================================================
>>> --- unittests/ProfileData/InstrProfTest.cpp
>>> +++ unittests/ProfileData/InstrProfTest.cpp
>>> @@ -50,7 +50,8 @@
>>>  }
>>>
>>>  TEST_F(InstrProfTest, write_and_read_one_function) {
>>> -  Writer.addFunctionCounts("foo", 0x1234, {1, 2, 3, 4});
>>> +  InstrProfRecord Record("foo", 0x1234, {1, 2, 3, 4});
>>> +  Writer.addRecord(Record);
>>>    auto Profile = Writer.writeBuffer();
>>>    readProfile(std::move(Profile));
>>>
>>> @@ -67,13 +68,16 @@
>>>  }
>>>
>>>  TEST_F(InstrProfTest, get_function_counts) {
>>> -  Writer.addFunctionCounts("foo", 0x1234, {1, 2});
>>> -  Writer.addFunctionCounts("foo", 0x1235, {3, 4});
>>> +  InstrProfRecord Record1("foo", 0x1234, {1, 2});
>>> +  InstrProfRecord Record2("foo", 0x1235, {3, 4});
>>> +  Writer.addRecord(Record1);
>>> +  Writer.addRecord(Record2);
>>>    auto Profile = Writer.writeBuffer();
>>>    readProfile(std::move(Profile));
>>>
>>>    std::vector<uint64_t> Counts;
>>> -  ASSERT_TRUE(NoError(Reader->getFunctionCounts("foo", 0x1234, Counts)));
>>> +  ASSERT_TRUE(NoError(
>>> +    Reader->getFunctionCounts("foo", 0x1234, Counts)));
>>>    ASSERT_EQ(2U, Counts.size());
>>>    ASSERT_EQ(1U, Counts[0]);
>>>    ASSERT_EQ(2U, Counts[1]);
>>> @@ -92,9 +96,12 @@
>>>  }
>>>
>>>  TEST_F(InstrProfTest, get_max_function_count) {
>>> -  Writer.addFunctionCounts("foo", 0x1234, {1ULL << 31, 2});
>>> -  Writer.addFunctionCounts("bar", 0, {1ULL << 63});
>>> -  Writer.addFunctionCounts("baz", 0x5678, {0, 0, 0, 0});
>>> +  InstrProfRecord Record1("foo", 0x1234, {1ULL << 31, 2});
>>> +  InstrProfRecord Record2("bar", 0, {1ULL << 63});
>>> +  InstrProfRecord Record3("baz", 0x5678, {0, 0, 0, 0});
>>> +  Writer.addRecord(Record1);
>>> +  Writer.addRecord(Record2);
>>> +  Writer.addRecord(Record3);
>>>    auto Profile = Writer.writeBuffer();
>>>    readProfile(std::move(Profile));
>>>
>>> Index: unittests/ProfileData/CoverageMappingTest.cpp
>>> ===================================================================
>>> --- unittests/ProfileData/CoverageMappingTest.cpp
>>> +++ unittests/ProfileData/CoverageMappingTest.cpp
>>> @@ -188,7 +188,8 @@
>>>  }
>>>
>>>  TEST_F(CoverageMappingTest, basic_coverage_iteration) {
>>> -  ProfileWriter.addFunctionCounts("func", 0x1234, {30, 20, 10, 0});
>>> +  InstrProfRecord Record("func", 0x1234, {30, 20, 10, 0});
>>> +  ProfileWriter.addRecord(Record);
>>>    readProfCounts();
>>>
>>>    addCMR(Counter::getCounter(0), "file1", 1, 1, 9, 9);
>>> @@ -238,7 +239,8 @@
>>>  }
>>>
>>>  TEST_F(CoverageMappingTest, combine_regions) {
>>> -  ProfileWriter.addFunctionCounts("func", 0x1234, {10, 20, 30});
>>> +  InstrProfRecord Record("func", 0x1234, {10, 20, 30});
>>> +  ProfileWriter.addRecord(Record);
>>>    readProfCounts();
>>>
>>>    addCMR(Counter::getCounter(0), "file1", 1, 1, 9, 9);
>>> @@ -256,7 +258,8 @@
>>>  }
>>>
>>>  TEST_F(CoverageMappingTest, dont_combine_expansions) {
>>> -  ProfileWriter.addFunctionCounts("func", 0x1234, {10, 20});
>>> +  InstrProfRecord Record("func", 0x1234, {10, 20});
>>> +  ProfileWriter.addRecord(Record);
>>>    readProfCounts();
>>>
>>>    addCMR(Counter::getCounter(0), "file1", 1, 1, 9, 9);
>>> @@ -275,7 +278,8 @@
>>>  }
>>>
>>>  TEST_F(CoverageMappingTest, strip_filename_prefix) {
>>> -  ProfileWriter.addFunctionCounts("file1:func", 0x1234, {10});
>>> +  InstrProfRecord Record("file1:func", 0x1234, {10});
>>> +  ProfileWriter.addRecord(Record);
>>>    readProfCounts();
>>>
>>>    addCMR(Counter::getCounter(0), "file1", 1, 1, 9, 9);
>>> Index: tools/llvm-profdata/llvm-profdata.cpp
>>> ===================================================================
>>> --- tools/llvm-profdata/llvm-profdata.cpp
>>> +++ tools/llvm-profdata/llvm-profdata.cpp
>>> @@ -58,9 +58,8 @@
>>>        exitWithError(ec.message(), Filename);
>>>
>>>      auto Reader = std::move(ReaderOrErr.get());
>>> -    for (const auto &I : *Reader)
>>> -      if (std::error_code EC =
>>> -              Writer.addFunctionCounts(I.Name, I.Hash, I.Counts))
>>> +    for (auto &I : *Reader)
>>> +      if (std::error_code EC = Writer.addRecord(I))
>>>          errs() << Filename << ": " << I.Name << ": " << EC.message() << "\n";
>>>      if (Reader->hasError())
>>>        exitWithError(Reader->getError().message(), Filename);
>>> @@ -134,8 +133,8 @@
>>>  }
>>>
>>>  static int showInstrProfile(std::string Filename, bool ShowCounts,
>>> -                            bool ShowAllFunctions, std::string ShowFunction,
>>> -                            raw_fd_ostream &OS) {
>>> +                            bool ShowIndirectCallTargets, bool ShowAllFunctions,
>>> +                            std::string ShowFunction, raw_fd_ostream &OS) {
>>>    auto ReaderOrErr = InstrProfReader::create(Filename);
>>>    if (std::error_code EC = ReaderOrErr.getError())
>>>      exitWithError(EC.message(), Filename);
>>> @@ -146,7 +145,7 @@
>>>    for (const auto &Func : *Reader) {
>>>      bool Show =
>>>          ShowAllFunctions || (!ShowFunction.empty() &&
>>> -                             Func.Name.find(ShowFunction) != Func.Name.npos);
>>> +        StringRef(Func.Name).find(ShowFunction) != StringRef::npos);
>>
>> What's this change for? Func.Name is already a StringRef.
>>
>>>
>>>      ++TotalFunctions;
>>>      assert(Func.Counts.size() > 0 && "function missing entry counter");
>>> @@ -162,6 +161,10 @@
>>>           << "    Hash: " << format("0x%016" PRIx64, Func.Hash) << "\n"
>>>           << "    Counters: " << Func.Counts.size() << "\n"
>>>           << "    Function count: " << Func.Counts[0] << "\n";
>>> +      if (ShowIndirectCallTargets)
>>> +        OS << "    Indirect Call Site Count: "
>>> +           << Func.ValueSites[instrprof_value_kind::indirect_call_target].size()
>>> +           << "\n";
>>>      }
>>>
>>>      if (Show && ShowCounts)
>>> @@ -174,6 +177,16 @@
>>>      }
>>>      if (Show && ShowCounts)
>>>        OS << "]\n";
>>> +
>>> +    if (Show && ShowIndirectCallTargets) {
>>> +      OS << "    Indirect Target Results: \n";
>>> +      for (size_t I = 0, E = Func.ValueSites[0].size(); I < E; ++I) {
>>
>> Please don't use magic numbers like "Func.ValueSites[0]" - this should
>> say instrprof_value_kind::indirect_call_target if we're doing things
>> this way. That said, I don't think this is the best way to expose
>> this. More on that later.
>>
>>> +        for (auto V : Func.ValueSites[0][I].ValueData) {
>>> +          OS << "\t[ " << I << ", ";
>>> +          OS << (const char*) V.first << ", " << V.second << " ]\n";
>>> +        }
>>> +      }
>>> +    }
>>>    }
>>>    if (Reader->hasError())
>>>      exitWithError(Reader->getError().message(), Filename);
>>> @@ -210,6 +223,8 @@
>>>
>>>    cl::opt<bool> ShowCounts("counts", cl::init(false),
>>>                             cl::desc("Show counter values for shown functions"));
>>> +  cl::opt<bool> ShowIndirectCallTargets("ic-targets", cl::init(false),
>>> +      cl::desc("Show indirect call site target values for shown functions"));
>>
>> This formatting doesn't look right. Please clang-format your patch.
>>
>>>    cl::opt<bool> ShowAllFunctions("all-functions", cl::init(false),
>>>                                   cl::desc("Details for every function"));
>>>    cl::opt<std::string> ShowFunction("function",
>>> @@ -238,8 +253,8 @@
>>>      errs() << "warning: -function argument ignored: showing all functions\n";
>>>
>>>    if (ProfileKind == instr)
>>> -    return showInstrProfile(Filename, ShowCounts, ShowAllFunctions,
>>> -                            ShowFunction, OS);
>>> +    return showInstrProfile(Filename, ShowCounts, ShowIndirectCallTargets,
>>> +                            ShowAllFunctions, ShowFunction, OS);
>>>    else
>>>      return showSampleProfile(Filename, ShowCounts, ShowAllFunctions,
>>>                               ShowFunction, OS);
>>> Index: lib/ProfileData/InstrProfWriter.cpp
>>> ===================================================================
>>> --- lib/ProfileData/InstrProfWriter.cpp
>>> +++ lib/ProfileData/InstrProfWriter.cpp
>>> @@ -26,8 +26,8 @@
>>>    typedef StringRef key_type;
>>>    typedef StringRef key_type_ref;
>>>
>>> -  typedef const InstrProfWriter::CounterData *const data_type;
>>> -  typedef const InstrProfWriter::CounterData *const data_type_ref;
>>> +  typedef const InstrProfWriter::ProfilingData *const data_type;
>>> +  typedef const InstrProfWriter::ProfilingData *const data_type_ref;
>>>
>>>    typedef uint64_t hash_value_type;
>>>    typedef uint64_t offset_type;
>>> @@ -43,10 +43,27 @@
>>>
>>>      offset_type N = K.size();
>>>      LE.write<offset_type>(N);
>>> +    offset_type M = sizeof(uint64_t); // size of ProfileData
>>
>> This comment is confusing - I guess you meant "ProfilingData" here, and
>> you mean this space is for the number of elements in it, but that isn't
>> clear at all.
>>
>>> +    for (const auto &ProfileData : *V) {
>>> +      M += sizeof(uint64_t); // size of ProfileData.first
>>
>> Comments like "size of ProfileData.first" aren't helpful - please say
>> what the space we're reserving is actually for (ie, the function hash
>> here).
>>
>>> +      M += sizeof(uint32_t); // size of ProfileData.second.Counts.size()
>>
>> Why are you chaning this count to a uint32 from a uint64? While it saves
>> some space, it means that the counts will always be stored unaligned
>> which can be quite bad for reading them.
>>
>>> +      M += ProfileData.second.Counts.size() * sizeof(uint64_t);
>>> +      // Value data
>>> +      for (uint32_t Kind = instrprof_value_kind::first;
>>> +           Kind < instrprof_value_kind::size; ++Kind) {
>>> +        // Number of value sites per value kind
>>> +        M += sizeof(uint32_t); // ProfileData.second.ValuesSites[Kind].size()
>>> +        for (InstrProfValueSiteRecord I : ProfileData.second.ValueSites[Kind]) {
>>> +          // Number of value data pairs at a value site
>>> +          M += sizeof(uint32_t); // I.ValueData.size()
>>
>> Similarly for these uint32s.
>>
>>> +          for (auto V : I.ValueData) {
>>> +            M += sizeof(uint64_t);  // size of TargetValue
>>> +            M += sizeof(uint64_t);  // size of NumTaken
>>> +          }
>>> +        }
>>> +      }
>>> +    }
>>>
>>> -    offset_type M = 0;
>>> -    for (const auto &Counts : *V)
>>> -      M += (2 + Counts.second.size()) * sizeof(uint64_t);
>>>      LE.write<offset_type>(M);
>>>
>>>      return std::make_pair(N, M);
>>> @@ -61,49 +78,80 @@
>>>      using namespace llvm::support;
>>>      endian::Writer<little> LE(Out);
>>>
>>> -    for (const auto &Counts : *V) {
>>> -      LE.write<uint64_t>(Counts.first);
>>> -      LE.write<uint64_t>(Counts.second.size());
>>> -      for (uint64_t I : Counts.second)
>>> +    LE.write<uint64_t>(V->size());
>>
>> Not sure we need this size - can't we just derive this from the data
>> size in the reader?
>>
>>> +    for (const auto &ProfileData : *V) {
>>> +      LE.write<uint64_t>(ProfileData.first);
>>> +      LE.write<uint32_t>(ProfileData.second.Counts.size());
>>> +      for (uint64_t I : ProfileData.second.Counts)
>>>          LE.write<uint64_t>(I);
>>> +      // Write value data
>>> +      for (uint32_t Kind = instrprof_value_kind::first;
>>> +           Kind < instrprof_value_kind::size; ++Kind) {
>>> +        // Write number of value sites per value kind
>>> +        LE.write<uint32_t>(ProfileData.second.ValueSites[Kind].size());
>>> +        for (InstrProfValueSiteRecord I : ProfileData.second.ValueSites[Kind]) {
>>> +          // Write number of value data pairs at this value site
>>> +          LE.write<uint32_t>(I.ValueData.size());
>>> +          for (auto V : I.ValueData) {
>>> +            if (Kind == instrprof_value_kind::indirect_call_target)
>>> +              LE.write<uint64_t>(ComputeHash((const char *)V.first));
>>> +            else LE.write<uint64_t>(V.first);
>>
>> This looks wrong. Are you just relying on the fact that all of the
>> strings we use happen to already be in the hash table alongside our
>> function data? This will behave badly when there are hash collisions,
>> and will make it quite a bit harder if we want to explore shrinking the
>> format by avoiding storing function names if/when we don't need them.
>>
>> I really do think we should store these as an actual string table.
>>
>>> +            LE.write<uint64_t>(V.second);
>>> +          }
>>> +        }
>>> +      }
>>>      }
>>>    }
>>>  };
>>>  }
>>>
>>> +static std::error_code combineInstrProfRecords(InstrProfRecord &Dest,
>>> +                                               InstrProfRecord &Source,
>>> +                                               uint64_t &MaxFunctionCount) {
>>
>> I'm not sure we're gaining much by splitting this out from addRecord.
>>
>>> +  // If the number of counters doesn't match we either have bad data
>>> +  // or a hash collision.
>>> +  if (Dest.Counts.size() != Source.Counts.size())
>>> +    return instrprof_error::count_mismatch;
>>> +
>>> +  for (size_t I = 0, E = Source.Counts.size(); I < E; ++I) {
>>> +    if (Dest.Counts[I] + Source.Counts[I] < Dest.Counts[I])
>>> +      return instrprof_error::counter_overflow;
>>> +    Dest.Counts[I] += Source.Counts[I];
>>> +  }
>>> +
>>> +  for (uint32_t Kind = instrprof_value_kind::first;
>>> +       Kind < instrprof_value_kind::size; ++Kind) {
>>> +    if (Dest.ValueSites[Kind].size() != Source.ValueSites[Kind].size())
>>> +      return instrprof_error::value_site_count_mismatch;
>>> +    for (size_t I = 0,  E = Source.ValueSites[Kind].size(); I < E; ++I)
>>> +      Dest.ValueSites[Kind][I].mergeValueData(Source.ValueSites[Kind][I]);
>>> +  }
>>> +
>>> +  // We keep track of the max function count as we go for simplicity.
>>> +  if (Dest.Counts[0] > MaxFunctionCount)
>>> +    MaxFunctionCount = Dest.Counts[0];
>>> +
>>> +  return instrprof_error::success;
>>> +}
>>> +
>>>  std::error_code
>>> -InstrProfWriter::addFunctionCounts(StringRef FunctionName,
>>> -                                   uint64_t FunctionHash,
>>> -                                   ArrayRef<uint64_t> Counters) {
>>> -  auto &CounterData = FunctionData[FunctionName];
>>> +InstrProfWriter::addRecord(InstrProfRecord &I) {
>>> +  I.updateStringTableReferences(StringTable);
>>
>> It's kind of weird that this is a method on the InstrProfRecord - I
>> think the function that updates the string table fits better here in the
>> InstrProfWriter implementation.
>>
>>> +  auto &ProfileDataMap = FunctionData[I.Name];
>>>
>>> -  auto Where = CounterData.find(FunctionHash);
>>> -  if (Where == CounterData.end()) {
>>> +  auto Where = ProfileDataMap.find(I.Hash);
>>> +  if (Where == ProfileDataMap.end()) {
>>>      // We've never seen a function with this name and hash, add it.
>>> -    CounterData[FunctionHash] = Counters;
>>> +    ProfileDataMap[I.Hash] = I;
>>> +
>>>      // We keep track of the max function count as we go for simplicity.
>>> -    if (Counters[0] > MaxFunctionCount)
>>> -      MaxFunctionCount = Counters[0];
>>> +    if (I.Counts[0] > MaxFunctionCount)
>>> +      MaxFunctionCount = I.Counts[0];
>>>      return instrprof_error::success;
>>>    }
>>>
>>>    // We're updating a function we've seen before.
>>> -  auto &FoundCounters = Where->second;
>>> -  // If the number of counters doesn't match we either have bad data or a hash
>>> -  // collision.
>>> -  if (FoundCounters.size() != Counters.size())
>>> -    return instrprof_error::count_mismatch;
>>> -
>>> -  for (size_t I = 0, E = Counters.size(); I < E; ++I) {
>>> -    if (FoundCounters[I] + Counters[I] < FoundCounters[I])
>>> -      return instrprof_error::counter_overflow;
>>> -    FoundCounters[I] += Counters[I];
>>> -  }
>>> -  // We keep track of the max function count as we go for simplicity.
>>> -  if (FoundCounters[0] > MaxFunctionCount)
>>> -    MaxFunctionCount = FoundCounters[0];
>>> -
>>> -  return instrprof_error::success;
>>> +  return combineInstrProfRecords(Where->second, I, MaxFunctionCount);
>>>  }
>>>
>>>  std::pair<uint64_t, uint64_t> InstrProfWriter::writeImpl(raw_ostream &OS) {
>>> Index: lib/ProfileData/InstrProfReader.cpp
>>> ===================================================================
>>> --- lib/ProfileData/InstrProfReader.cpp
>>> +++ lib/ProfileData/InstrProfReader.cpp
>>> @@ -302,8 +302,9 @@
>>>  typedef InstrProfLookupTrait::data_type data_type;
>>>  typedef InstrProfLookupTrait::offset_type offset_type;
>>>
>>> -data_type InstrProfLookupTrait::ReadData(StringRef K, const unsigned char *D,
>>> -                                         offset_type N) {
>>> +data_type InstrProfLookupTrait::ReadDataV1V2(StringRef K,
>>> +                                             const unsigned char *D,
>>> +                                             offset_type N) {
>>>
>>>    // Check if the data is corrupt. If so, don't try to read it.
>>>    if (N % sizeof(uint64_t))
>>> @@ -337,7 +338,71 @@
>>>      for (unsigned J = 0; J < NumCounts; ++J)
>>>        CounterBuffer.push_back(endian::readNext<uint64_t, little, unaligned>(D));
>>>
>>> +    DataBuffer.push_back(
>>> +        InstrProfRecord(K.data(), Hash, std::move(CounterBuffer)));
>>> +  }
>>> +  return DataBuffer;
>>> +}
>>> +
>>> +data_type InstrProfLookupTrait::ReadDataV3(StringRef K,
>>> +                                           const unsigned char *D,
>>> +                                           offset_type N) {
>>
>> Other than the fact that we've changed NumCounts (or CountsSize as you
>> call it below) to a uint32 for whatever reason, I don't really see what
>> we're gaining by splitting this into ReadDataV1V2 and ReadDataV3 - Most
>> of the logic is identical, and we can just do "if (ver < V3) continue"
>> after the DataBuffer.push_back() line to share all of it, no?
>>
>>> +  using namespace support;
>>> +  const unsigned char *End = D + N;
>>> +  // Read number of data entries.
>>> +  if (D + sizeof(uint64_t) > End)
>>> +    return data_type();
>>> +  uint32_t DataBufferSize = endian::readNext<uint64_t, little, unaligned>(D);
>>> +  DataBuffer.clear();
>>> +  DataBuffer.reserve(DataBufferSize);
>>> +  std::vector<uint64_t> CounterBuffer;
>>> +  for (uint32_t I = 0; I < DataBufferSize; ++I) {
>>> +    // Read hash and number of counters
>>> +    if (D + sizeof (uint64_t) + sizeof(uint32_t) >= End)
>>> +      return data_type();
>>> +    uint64_t Hash = endian::readNext<uint64_t, little, unaligned>(D);
>>> +    uint32_t CountsSize = endian::readNext<uint32_t, little, unaligned>(D);
>>> +
>>> +    // Read counter values
>>> +    if (D + CountsSize * sizeof(uint64_t) >= End)
>>> +      return data_type();
>>> +
>>> +    CounterBuffer.clear();
>>> +    for (uint32_t C = 0; C < CountsSize; ++C)
>>> +      CounterBuffer.push_back(endian::readNext<uint64_t, little, unaligned>(D));
>>> +
>>>      DataBuffer.push_back(InstrProfRecord(K, Hash, std::move(CounterBuffer)));
>>> +
>>> +    // Read value profiling data
>>> +    for (uint32_t Kind = instrprof_value_kind::first;
>>> +         Kind < instrprof_value_kind::size; ++Kind) {
>>
>> I still think we should store the kind in the data here and then read
>> the following data based on that. There's no guarantee that all kinds
>> that aren't indirect_call_target will behave as they do in this loop as
>> is, so I'd rather just reject outright kinds we don't understand.
>>
>> If the type is in the data we can do that, and it also saves a few bytes
>> for functions that don't have any value profiling data (and more bytes
>> when we support more kinds of value profiling).
>>
>> This also has the benefit that the V2 data will be valid for V3, so we
>> might even be able to get away without adding branches to check the
>> version difference here, which would be a convenient simplification. I
>> haven't thought through if that actually works though.
>>
>>> +      // Read number of value sites.
>>> +      if (D + sizeof(uint32_t) > End)
>>> +        return data_type();
>>> +      uint32_t ValueSiteCount = endian::readNext<uint32_t, little, unaligned>(D);
>>> +
>>> +      for (uint32_t VSite = 0; VSite < ValueSiteCount; ++VSite) {
>>> +        // Read number of value data pairs at value site.
>>> +        if (D + sizeof(uint32_t) > End)
>>> +          return data_type();
>>> +        uint32_t ValueDataCount = endian::readNext<uint32_t, little, unaligned>(D);
>>> +
>>> +        InstrProfValueSiteRecord VSiteRecord;
>>> +        for (uint32_t VCount = 0; VCount < ValueDataCount; ++VCount) {
>>> +          if (D + 2 * sizeof(uint64_t) > End)
>>> +            return data_type();
>>> +          uint64_t Value = endian::readNext<uint64_t, little, unaligned>(D);
>>> +          if (Kind == instrprof_value_kind::indirect_call_target) {
>>> +            auto Result = HashKeyMap.find(Value);
>>> +            if (Result != HashKeyMap.end())
>>> +              Value = (uint64_t)Result->second;
>>
>> What if it isn't found? As I mentioned earlier, I'm not very comfortable
>> with this approach to storing the strings.
>>
>>> +          }
>>> +          uint64_t NumTaken = endian::readNext<uint64_t, little, unaligned>(D);
>>> +          VSiteRecord.ValueData.push_back(std::make_pair(Value, NumTaken));
>>> +        }
>>> +        DataBuffer[DataBuffer.size()-1].ValueSites[Kind].push_back(VSiteRecord);
>>
>> This is better written as DataBuffer.back(). It also might make sense to
>> sort these here rather than in mergeValueData - more on that later.
>>
>>> +      }
>>> +    }
>>>    }
>>>    return DataBuffer;
>>>  }
>>> @@ -379,19 +444,24 @@
>>>    if (HashType > IndexedInstrProf::HashT::Last)
>>>      return error(instrprof_error::unsupported_hash_type);
>>>    uint64_t HashOffset = endian::readNext<uint64_t, little, unaligned>(Cur);
>>> -
>>>    // The rest of the file is an on disk hash table.
>>>    Index.reset(InstrProfReaderIndex::Create(
>>>        Start + HashOffset, Cur, Start,
>>> -      InstrProfLookupTrait(HashType, FormatVersion)));
>>> +      InstrProfLookupTrait(HashType, FormatVersion, HashKeyMap)));
>>> +  for (auto Key : Index->keys()) {
>>> +    const char* KeyTableRef = StringTable.insertString(Key);
>>> +    HashKeyMap.insert(std::make_pair(ComputeHash(HashType, Key), KeyTableRef));
>>> +  }
>>>    // Set up our iterator for readNextRecord.
>>>    RecordIterator = Index->data_begin();
>>>
>>>    return success();
>>>  }
>>>
>>>  std::error_code IndexedInstrProfReader::getFunctionCounts(
>>> -    StringRef FuncName, uint64_t FuncHash, std::vector<uint64_t> &Counts) {
>>> +  StringRef FuncName, uint64_t FuncHash,
>>> +  std::vector<uint64_t> &Counts) {
>>> +
>>>    auto Iter = Index->find(FuncName);
>>>    if (Iter == Index->end())
>>>      return error(instrprof_error::unknown_function);
>>> @@ -411,6 +481,29 @@
>>>    return error(instrprof_error::hash_mismatch);
>>>  }
>>>
>>> +std::error_code IndexedInstrProfReader::getFunctionValuesForKind(
>>> +  StringRef FuncName, uint64_t FuncHash, uint32_t ValueKind,
>>> +  std::vector<InstrProfValueSiteRecord> &Values) {
>>
>> Does this function work? Won't it return an index into a string table
>> with no means to turn that into the actual string for indirect call
>> profiling?
>>
>>> +
>>> +  auto Iter = Index->find(FuncName);
>>> +  if (Iter == Index->end())
>>> +    return error(instrprof_error::unknown_function);
>>> +
>>> +  // Found it. Look for counters with the right hash.
>>> +  ArrayRef<InstrProfRecord> Data = (*Iter);
>>> +  if (Data.empty())
>>> +    return error(instrprof_error::malformed);
>>> +
>>> +  for (unsigned I = 0, E = Data.size(); I < E; ++I) {
>>> +    // Check for a match and fill the vector if there is one.
>>> +    if (Data[I].Hash == FuncHash) {
>>> +      Values = Data[I].ValueSites[ValueKind];
>>> +      return success();
>>> +    }
>>> +  }
>>> +  return error(instrprof_error::hash_mismatch);
>>> +}
>>> +
>>>  std::error_code
>>>  IndexedInstrProfReader::readNextRecord(InstrProfRecord &Record) {
>>>    // Are we out of records?
>>> Index: lib/ProfileData/InstrProfIndexed.h
>>> ===================================================================
>>> --- lib/ProfileData/InstrProfIndexed.h
>>> +++ lib/ProfileData/InstrProfIndexed.h
>>> @@ -47,7 +47,7 @@
>>>  }
>>>
>>>  const uint64_t Magic = 0x8169666f72706cff; // "\xfflprofi\x81"
>>> -const uint64_t Version = 2;
>>> +const uint64_t Version = 3;
>>>  const HashT HashType = HashT::MD5;
>>>  }
>>>
>>> Index: lib/ProfileData/InstrProf.cpp
>>> ===================================================================
>>> --- lib/ProfileData/InstrProf.cpp
>>> +++ lib/ProfileData/InstrProf.cpp
>>> @@ -50,6 +50,8 @@
>>>        return "Function count mismatch";
>>>      case instrprof_error::counter_overflow:
>>>        return "Counter overflow";
>>> +    case instrprof_error::value_site_count_mismatch:
>>> +      return "Function's value site counts mismatch";
>>>      }
>>>      llvm_unreachable("A value of instrprof_error has no message.");
>>>    }
>>> Index: include/llvm/ProfileData/InstrProfWriter.h
>>> ===================================================================
>>> --- include/llvm/ProfileData/InstrProfWriter.h
>>> +++ include/llvm/ProfileData/InstrProfWriter.h
>>> @@ -15,33 +15,30 @@
>>>  #ifndef LLVM_PROFILEDATA_INSTRPROFWRITER_H
>>>  #define LLVM_PROFILEDATA_INSTRPROFWRITER_H
>>>
>>> -#include "llvm/ADT/ArrayRef.h"
>>>  #include "llvm/ADT/DenseMap.h"
>>>  #include "llvm/ADT/StringMap.h"
>>>  #include "llvm/ProfileData/InstrProf.h"
>>>  #include "llvm/Support/DataTypes.h"
>>>  #include "llvm/Support/MemoryBuffer.h"
>>>  #include "llvm/Support/raw_ostream.h"
>>> -#include <vector>
>>>
>>>  namespace llvm {
>>>
>>>  /// Writer for instrumentation based profile data.
>>>  class InstrProfWriter {
>>>  public:
>>> -  typedef SmallDenseMap<uint64_t, std::vector<uint64_t>, 1> CounterData;
>>> +  typedef SmallDenseMap<uint64_t, InstrProfRecord, 1> ProfilingData;
>>>  private:
>>> -  StringMap<CounterData> FunctionData;
>>> +  InstrProfStringTable StringTable;
>>> +  StringMap<ProfilingData> FunctionData;
>>>    uint64_t MaxFunctionCount;
>>>  public:
>>>    InstrProfWriter() : MaxFunctionCount(0) {}
>>>
>>>    /// Add function counts for the given function. If there are already counts
>>>    /// for this function and the hash and number of counts match, each counter is
>>>    /// summed.
>>> -  std::error_code addFunctionCounts(StringRef FunctionName,
>>> -                                    uint64_t FunctionHash,
>>> -                                    ArrayRef<uint64_t> Counters);
>>> +  std::error_code addRecord(InstrProfRecord &I);
>>
>> I'm not a fan of dropping const here, but I think it can come back if we
>> move the updateStringTable method anyway.
>>
>>>    /// Write the profile to \c OS
>>>    void write(raw_fd_ostream &OS);
>>>    /// Write the profile, returning the raw data. For testing.
>>> Index: include/llvm/ProfileData/InstrProfReader.h
>>> ===================================================================
>>> --- include/llvm/ProfileData/InstrProfReader.h
>>> +++ include/llvm/ProfileData/InstrProfReader.h
>>> @@ -24,6 +24,7 @@
>>>  #include "llvm/Support/MemoryBuffer.h"
>>>  #include "llvm/Support/OnDiskHashTable.h"
>>>  #include <iterator>
>>> +#include <map>
>>>
>>>  namespace llvm {
>>>
>>> @@ -145,7 +146,6 @@
>>>      const uint64_t CountersDelta;
>>>      const uint64_t NamesDelta;
>>>    };
>>> -
>>>    bool ShouldSwapBytes;
>>>    uint64_t CountersDelta;
>>>    uint64_t NamesDelta;
>>> @@ -195,10 +195,13 @@
>>>    std::vector<InstrProfRecord> DataBuffer;
>>>    IndexedInstrProf::HashT HashType;
>>>    unsigned FormatVersion;
>>> +  const std::map<uint64_t, const char*> &HashKeyMap;
>>>
>>>  public:
>>> -  InstrProfLookupTrait(IndexedInstrProf::HashT HashType, unsigned FormatVersion)
>>> -      : HashType(HashType), FormatVersion(FormatVersion) {}
>>> +  InstrProfLookupTrait(IndexedInstrProf::HashT HashType,
>>> +      unsigned FormatVersion, std::map<uint64_t, const char*> &HashKeyMap)
>>> +      : HashType(HashType), FormatVersion(FormatVersion),
>>> +      HashKeyMap(HashKeyMap) {}
>>>
>>>    typedef ArrayRef<InstrProfRecord> data_type;
>>>
>>> @@ -209,6 +212,7 @@
>>>
>>>    static bool EqualKey(StringRef A, StringRef B) { return A == B; }
>>>    static StringRef GetInternalKey(StringRef K) { return K; }
>>> +  static StringRef GetExternalKey(StringRef K) { return K; }
>>>
>>>    hash_value_type ComputeHash(StringRef K);
>>>
>>> @@ -224,7 +228,17 @@
>>>      return StringRef((const char *)D, N);
>>>    }
>>>
>>> -  data_type ReadData(StringRef K, const unsigned char *D, offset_type N);
>>> +  data_type ReadDataV1V2(StringRef K, const unsigned char *D, offset_type N);
>>> +  data_type ReadDataV3(StringRef K, const unsigned char *D, offset_type N);
>>> +
>>> +  data_type ReadData(StringRef K, const unsigned char *D, offset_type N) {
>>> +    switch (FormatVersion) {
>>> +      case 1:
>>> +      case 2: return ReadDataV1V2(K, D, N);
>>> +      case 3: return ReadDataV3(K, D, N);
>>> +    }
>>> +    return data_type();
>>> +  }
>>>  };
>>>
>>>  typedef OnDiskIterableChainedHashTable<InstrProfLookupTrait>
>>> @@ -243,6 +257,10 @@
>>>    uint64_t FormatVersion;
>>>    /// The maximal execution count among all functions.
>>>    uint64_t MaxFunctionCount;
>>> +  /// String table.
>>> +  InstrProfStringTable StringTable;
>>> +  /// Map of hash values to const char* keys in profiling data.
>>> +  std::map<uint64_t, const char*> HashKeyMap;
>>>
>>>    IndexedInstrProfReader(const IndexedInstrProfReader &) = delete;
>>>    IndexedInstrProfReader &operator=(const IndexedInstrProfReader &) = delete;
>>> @@ -261,6 +279,13 @@
>>>    /// Fill Counts with the profile data for the given function name.
>>>    std::error_code getFunctionCounts(StringRef FuncName, uint64_t FuncHash,
>>>                                      std::vector<uint64_t> &Counts);
>>> +
>>> +  /// Return value profile data for the given function name and hash and
>>> +  /// value profiling kind
>>> +  std::error_code getFunctionValuesForKind(StringRef FuncName,
>>> +      uint64_t FuncHash, uint32_t ValueKind,
>>> +      std::vector<InstrProfValueSiteRecord> &Values);
>>> +
>>>    /// Return the maximum of all known function counts.
>>>    uint64_t getMaximumFunctionCount() { return MaxFunctionCount; }
>>>
>>> Index: include/llvm/ProfileData/InstrProf.h
>>> ===================================================================
>>> --- include/llvm/ProfileData/InstrProf.h
>>> +++ include/llvm/ProfileData/InstrProf.h
>>> @@ -16,10 +16,11 @@
>>>  #ifndef LLVM_PROFILEDATA_INSTRPROF_H_
>>>  #define LLVM_PROFILEDATA_INSTRPROF_H_
>>>
>>> -#include "llvm/ADT/StringRef.h"
>>> +#include "llvm/ADT/StringSet.h"
>>>  #include <cstdint>
>>> -#include <system_error>
>>> +#include <list>
>>>  #include <vector>
>>> +#include <system_error>
>>>
>>>  namespace llvm {
>>>  const std::error_category &instrprof_category();
>>> @@ -37,21 +38,99 @@
>>>      unknown_function,
>>>      hash_mismatch,
>>>      count_mismatch,
>>> -    counter_overflow
>>> +    counter_overflow,
>>> +    value_site_count_mismatch
>>>  };
>>>
>>>  inline std::error_code make_error_code(instrprof_error E) {
>>>    return std::error_code(static_cast<int>(E), instrprof_category());
>>>  }
>>>
>>> +enum instrprof_value_kind : uint32_t {
>>> +    first = 0,
>>> +    indirect_call_target = 0,
>>> +    size = 1
>>> +};
>>> +
>>> +struct InstrProfStringTable {
>>> +  // Set of string values in profiling data.
>>> +  StringSet<> StringValueSet;
>>> +  InstrProfStringTable() { StringValueSet.clear(); }
>>> +  // Get a pointer to internal storage of a string in set
>>> +  const char* getStringData(StringRef Str) {
>>> +    auto Result = StringValueSet.find(Str);
>>> +    return (Result == StringValueSet.end()) ? nullptr : Result->first().data();
>>> +  }
>>> +  // Insert a string to StringTable
>>> +  const char* insertString(StringRef Str) {
>>> +    auto Result = StringValueSet.insert(Str);
>>> +    return Result.first->first().data();
>>> +  }
>>> +};
>>
>> Is this type giving us any value over just using the StringSet<>
>> directly?
>>
>>> +
>>> +struct InstrProfValueSiteRecord {
>>> +  // Typedef for a single TargetValue-NumTaken pair.
>>> +  typedef std::pair<uint64_t, uint64_t> ValueDataPair;
>>> +  // Value profiling data pairs at a given value site.
>>> +  std::list<ValueDataPair> ValueData;
>>> +
>>> +  InstrProfValueSiteRecord() { ValueData.clear(); }
>>> +  InstrProfValueSiteRecord(const InstrProfValueSiteRecord &Rec)
>>> +      : ValueData(std::move(Rec.ValueData)) {}
>>> +
>>> +  // Sort ValueData ascending by TargetValue
>>> +  void sortByTargetValues() {
>>> +    ValueData.sort([](const ValueDataPair &left, const ValueDataPair &right)
>>> +                { return left.first < right.first; });
>>
>> clang-format and follow LLVM convention for variable names please.
>>
>>> +  }
>>> +  // Sort ValueData descending by NumTaken
>>> +  void sortByNumTaken() {
>>> +    ValueData.sort([](const ValueDataPair &left, const ValueDataPair &right)
>>> +                { return left.second > right.second; });
>>> +  }
>>
>> This seems to be unused.
>>
>>> +  // Merge data from another InstrProfValueSiteRecord
>>> +  void mergeValueData(InstrProfValueSiteRecord &Input) {
>>> +    this->sortByTargetValues();
>>> +    Input.sortByTargetValues();
>>
>> It seems wasteful to re-sort these all the time. Would it make sense to
>> sort at read time, or possibly even canonicalize the on-disk order to
>> already be sorted?
>>
>> I guess it might make sense for the on-disk order to be sorted by
>> num-taken if that's what the frontend will want, and since merging is
>> less common the sort cost is okay. In any case, we should probably set
>> it up so that the on disk format is always sorted in a particular way -
>> that will remove some non-determinism and we can use it to speed up
>> whichever operation we think is more important.
>>
>>> +    auto I = ValueData.begin();
>>> +    auto J = Input.ValueData.begin();
>>> +    while (J != Input.ValueData.end()) {
>>> +      while (I != ValueData.end() && I->first < J->first)
>>> +        ++I;
>>> +      if (I != ValueData.end() && I->first == J->first) {
>>> +        I->second += J->second;
>>> +        ++I;
>>> +        ++J;
>>> +        continue;
>>> +      }
>>> +      ValueData.insert(I, *J);
>>> +      ++J;
>>> +    }
>>
>> I suspect a for-loop would read better here. Also please store the
>> ".end()" values in variables rather than re-evaluating.
>>
>>> +  }
>>> +};
>>> +
>>>  /// Profiling information for a single function.
>>>  struct InstrProfRecord {
>>>    InstrProfRecord() {}
>>>    InstrProfRecord(StringRef Name, uint64_t Hash, std::vector<uint64_t> Counts)
>>>        : Name(Name), Hash(Hash), Counts(std::move(Counts)) {}
>>>    StringRef Name;
>>>    uint64_t Hash;
>>>    std::vector<uint64_t> Counts;
>>> +  // Size of vector indicates the number of value sites for a value kind
>>> +  std::vector<InstrProfValueSiteRecord> ValueSites[instrprof_value_kind::size];
>>
>> I don't think we're gaining much by having this be an array - I was
>> thinking it would be more like
>>
>>   std::vector<InstrProfValueSiteRecord> IndirectCalls;
>>
>> Then when we add more value types, they can have their own variables and
>> be accessed directly. Most of the code that works with these will have a
>> particular kind in mind, and since the value data is dependent on kind
>> looping over these isn't generally that useful. That is, the looping we
>> have now is only in the reader and writer, and I can't see the users of
>> the data ever doing that.
>>
>> For the reader and writer, a switch statement over the kinds will allow
>> us to warn if someone doesn't update somewhere when they add a new
>> kind. For the users of profile data, Data->IndirectCalls reads a lot
>> better than Data->ValueSites[instrprof_value_kind::indirect_call_target].
>> It really seems like the right trade off to me.
>>
>>> +  // Clear value data entries
>>> +  void clearValueData() {
>>> +    for (uint32_t Kind = instrprof_value_kind::first;
>>> +         Kind < instrprof_value_kind::size; ++Kind)
>>> +      ValueSites[Kind].clear();
>>> +  }
>>
>> This is never called, it can be removed.
>>
>>> +  void updateStringTableReferences(InstrProfStringTable &StrTable) {
>>> +    Name = StrTable.insertString(Name);
>>> +    for (auto VSite : ValueSites[instrprof_value_kind::indirect_call_target])
>>> +      for (auto VData : VSite.ValueData)
>>> +        VData.first = (uint64_t)StrTable.insertString((const char *)VData.first);
>>> +  }
>>>  };
>>>
>>>  } // end namespace llvm