<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Sep 11, 2015 at 9:25 AM, Betul Buyukkurt via llvm-commits <span dir="ltr"><<a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi David,<br>

<br>

Thanks for the comments.<br>

<span class=""><br>

> Regarding the StringTable usage.<br>

> 1) Reader: I am not sure why you need the stringTable/StringSet for the<br>

reader in the first place. For HashKeyMap, can you directly map the hash<br>

val to the key string without doing the converting (by<br>

> inserting the key to the stringtab)? Is it used for string<br>

> merging/uniquing? Or the client of the ProfileReader needs to traverse<br>

the set of strings some how?<br>

<br>

</span>Value data for indirect call targets are char*'s to the string table<br>

entries. This is very helpful because the StringRef is not expected to be<br>

terminated with a null character always. Internally it stores a char* and<br>

a length. So the null ending is not always maintained. Thus, the strings<br>

containing the names in the raw profile file or the indexed format are<br>

also _not_ expected to be null terminated. Therefore, as far as the raw<br>

reader is concerned (because I tested it) creating an entry in the string<br>

table first and making sure the value data points to it helped value data<br>

maintain proper C strings with accurate contents. Having char*'s in value<br>

profile data helps when writing (in llvm-profdata) or forming other<br>

strings (in clang) by just casting the value data as a char pointer. I've<br>

not tested explicitly the case of the keys in OnDiskHashTable, but as far<br>

as I can tell from the code below:<br>

<br>

// L.first is the length of the Key.<br>

InfoObj->ReadKey(LocalPtr, L.first);<br>

<br>

I can see that the internal representation does not depend on null<br>

termination of strings.<br>

<span class=""><br>

> 2) Writer: the string reference update is needed because the string<br>

memory in the reader will be reclaimed. However there might be a better<br>

way to do so:<br>

> With this:<br>

>  auto &ProfileDataMap = FunctionData[I.Name];<br>

> I.Name string gets duplicated in the StringMap as the new key. In other<br>

words, it is possible to do the update the string refs in<br>

> InstrProfRecords (to point to the keys in stringmap) after all records<br>

for one reader are added.<br>

<br>

</span>Then you have to make sure to inform the writer explicitly that one reader<br>

has reached its end and this introduces another writer API which is user<br>

level complication.<br>

<span class=""><br>

> However I think we should save the trouble -- see below.<br>

>>> The way the HashKeyMap works here seems a little off. Why does the<br>

Reader own it? Only the trait uses it. Also, since it follows a strict<br>

"fill then query" pattern, it's probably better to just use a vector<br>

that we sort after filling and then binary search later.<br>

>> OnDiskIterableChainedHashTable is a member of the reader class. So the<br>

reader may conveniently iterate over its keys to collect the strings<br>

and<br>

>> form the hash table accordingly. On the other hand the LookUpTrait<br>

class<br>

>> seemed to me like a class helping after the appropriate key is located.<br>

Also, the Info helper class is described as "reading values from the<br>

hash<br>

>> table's payload and computes the hash for a given key".<br>

> Please refer to the thread about reducing profile data size. With that<br>

implemented, the strings in the profile data will be gone. In other<br>

words, the value profiler (indirect call) is responsible to<br>

> re-establish the hash to string mapping, not the Indexed profiler reader<br>

itself.  This means all the translation code in this patch (from hash<br>

--> string in the reader, and string-->hash in the writer) should be<br>

removed -- there will be no special handling of icall value profile kind<br>

either.<br>

<br>

</span>I'm aware of the changes coming and have been thinking how the optimizer<br>

should adapt itself to having md5 hashes as incoming data. It needs to<br>

collect all the functions in the module and construct the md5 hashes from<br>

their names to make sure the proper match is detected before any further<br>

analysis or transformation. In that regard, the function name that clang<br>

will generate the hash from is at most important to make your approach<br>

viable. The hash should be re-construct-able in LLVM and possibly in LTO<br>

mode. This should also happen w/o needing to maintain debug info or<br>

passing -g on the command line.<br>

<br></blockquote><div><br></div><div>yes. </div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Clang uses filename to prefix the mangled function names for internal and<br>

some other linkage types.</blockquote><div><br></div><div><br></div><div>Only local linkage funcs.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> I'm not sure the filename prefix would be easily<br></blockquote><div><br></div><div>The easy way to do that is to directly change the internal function name itself with the path augmented name under LTO mode.</div><div><br></div><div>David</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

re-construct-able in LLVM under LTO mode. By the way, haven't had a chance<br>

to look at your patches yet. Just commenting based my reads on the<br>

llvm-dev thread.<br>

<span class="HOEnZb"><font color="#888888"><br>

-Betul<br>

</font></span><div class="HOEnZb"><div class="h5"><br>

> David<br>

>>>> +    // Emit warning if a hash collision is detected.<br>

>>>> +    if (Result.second == false)<br>

>>>> +      DEBUG(dbgs() << "IndexedInstrProfReader: hash collision detected:<br>

>>> \n"<br>

>>>> +                   << "\t Map Entry(Hash, Key): " <<<br>

>>> Result.first->first<br>

>>>> +                   << ", " << Result.first->second << "\n"<br>

>>>> +                   << "\t New Entry(Hash, Key): " <<<br>

>>> ComputeHash(HashType, Key)<br>

>>>> +                   << ", " << Key << "\n");<br>

>>> This is not "emitting a warning". This will only be printed if the<br>

host<br>

>>> compiler is built in debug mode, so it seems pretty pointless.<br>

Actually<br>

>>> emitting a proper warning from this point in the compiler might be<br>

kind<br>

>>> of tricky though.<br>

>>>> +  }<br>

>>>>    // Set up our iterator for readNextRecord.<br>

>>>>    RecordIterator = Index->data_begin();<br>

>>>> Index: lib/ProfileData/InstrProfIndexed.h<br>

>>>> ===================================================================<br>

--- lib/ProfileData/InstrProfIndexed.h<br>

>>>> +++ lib/ProfileData/InstrProfIndexed.h<br>

>>>> @@ -47,7 +47,7 @@<br>

>>>>  }<br>

>>>>  const uint64_t Magic = 0x8169666f72706cff; // "\xfflprofi\x81"<br>

>>>> -const uint64_t Version = 2;<br>

>>>> +const uint64_t Version = 3;<br>

>>>>  const HashT HashType = HashT::MD5;<br>

>>>>  }<br>

>>>> Index: lib/ProfileData/InstrProf.cpp<br>

>>>> ===================================================================<br>

--- lib/ProfileData/InstrProf.cpp<br>

>>>> +++ lib/ProfileData/InstrProf.cpp<br>

>>>> @@ -50,6 +50,8 @@<br>

>>>>        return "Function count mismatch";<br>

>>>>      case instrprof_error::counter_overflow:<br>

>>>>        return "Counter overflow";<br>

>>>> +    case instrprof_error::value_site_count_mismatch:<br>

>>>> +      return "Function's value site counts mismatch";<br>

>>>>      }<br>

>>>>      llvm_unreachable("A value of instrprof_error has no message.");<br>

>>>>    }<br>

>>>> Index: include/llvm/ProfileData/InstrProfWriter.h<br>

>>>> ===================================================================<br>

--- include/llvm/ProfileData/InstrProfWriter.h<br>

>>>> +++ include/llvm/ProfileData/InstrProfWriter.h<br>

>>>> @@ -15,33 +15,32 @@<br>

>>>>  #ifndef LLVM_PROFILEDATA_INSTRPROFWRITER_H<br>

>>>>  #define LLVM_PROFILEDATA_INSTRPROFWRITER_H<br>

>>>> -#include "llvm/ADT/ArrayRef.h"<br>

>>>>  #include "llvm/ADT/DenseMap.h"<br>

>>>> -#include "llvm/ADT/StringMap.h"<br>

>>>>  #include "llvm/ProfileData/InstrProf.h"<br>

>>>>  #include "llvm/Support/DataTypes.h"<br>

>>>>  #include "llvm/Support/MemoryBuffer.h"<br>

>>>>  #include "llvm/Support/raw_ostream.h"<br>

>>>> -#include <vector><br>

>>>>  namespace llvm {<br>

>>>>  /// Writer for instrumentation based profile data.<br>

>>>>  class InstrProfWriter {<br>

>>>>  public:<br>

>>>> -  typedef SmallDenseMap<uint64_t, std::vector<uint64_t>, 1><br>

>>> CounterData;<br>

>>>> +  typedef SmallDenseMap<uint64_t, InstrProfRecord, 1> ProfilingData; +<br>

>>>>  private:<br>

>>>> -  StringMap<CounterData> FunctionData;<br>

>>>> +  InstrProfStringTable StringTable;<br>

>>>> +  StringMap<ProfilingData> FunctionData;<br>

>>>>    uint64_t MaxFunctionCount;<br>

>>>>  public:<br>

>>>>    InstrProfWriter() : MaxFunctionCount(0) {}<br>

>>>> +  /// Update string entries in profile data with references to<br>

>>> StringTable.<br>

>>>> +  void updateStringTableReferences(InstrProfRecord &I);<br>

>>>>    /// Add function counts for the given function. If there are<br>

>>>> already<br>

>>> counts<br>

>>>>    /// for this function and the hash and number of counts match,<br>

each<br>

>>> counter is<br>

>>>>    /// summed.<br>

>>>> -  std::error_code addFunctionCounts(StringRef FunctionName,<br>

>>>> -                                    uint64_t FunctionHash,<br>

>>>> -                                    ArrayRef<uint64_t> Counters); +<br>

std::error_code addRecord(InstrProfRecord &&I);<br>

>>>>    /// Write the profile to \c OS<br>

>>>>    void write(raw_fd_ostream &OS);<br>

>>>>    /// Write the profile, returning the raw data. For testing.<br>

>>>> Index: include/llvm/ProfileData/InstrProfReader.h<br>

>>>> ===================================================================<br>

--- include/llvm/ProfileData/InstrProfReader.h<br>

>>>> +++ include/llvm/ProfileData/InstrProfReader.h<br>

>>>> @@ -24,6 +24,7 @@<br>

>>>>  #include "llvm/Support/MemoryBuffer.h"<br>

>>>>  #include "llvm/Support/OnDiskHashTable.h"<br>

>>>>  #include <iterator><br>

>>>> +#include <map><br>

>>>>  namespace llvm {<br>

>>>> @@ -65,6 +66,9 @@<br>

>>>>    InstrProfIterator end() { return InstrProfIterator(); }<br>

>>>>  protected:<br>

>>>> +  /// String table for holding a unique copy of all the strings in the<br>

>>> profile.<br>

>>>> +  InstrProfStringTable StringTable;<br>

>>>> +<br>

>>>>    /// Set the current std::error_code and return same.<br>

>>>>    std::error_code error(std::error_code EC) {<br>

>>>>      LastError = EC;<br>

>>>> @@ -195,10 +199,13 @@<br>

>>>>    std::vector<InstrProfRecord> DataBuffer;<br>

>>>>    IndexedInstrProf::HashT HashType;<br>

>>>>    unsigned FormatVersion;<br>

>>>> +  const std::map<uint64_t, const char *> &HashKeyMap;<br>

>>>>  public:<br>

>>>> -  InstrProfLookupTrait(IndexedInstrProf::HashT HashType, unsigned<br>

>>> FormatVersion)<br>

>>>> -      : HashType(HashType), FormatVersion(FormatVersion) {}<br>

>>>> +  InstrProfLookupTrait(IndexedInstrProf::HashT HashType, unsigned<br>

>>> FormatVersion,<br>

>>>> +                       std::map<uint64_t, const char *> &HashKeyMap)<br>

+      : HashType(HashType), FormatVersion(FormatVersion),<br>

>>>> +        HashKeyMap(HashKeyMap) {}<br>

>>>>    typedef ArrayRef<InstrProfRecord> data_type;<br>

>>>> @@ -209,6 +216,7 @@<br>

>>>>    static bool EqualKey(StringRef A, StringRef B) { return A == B; }<br>

static StringRef GetInternalKey(StringRef K) { return K; }<br>

>>>> +  static StringRef GetExternalKey(StringRef K) { return K; }<br>

>>> What do you need GetExternalKey for?<br>

>>>>    hash_value_type ComputeHash(StringRef K);<br>

>>>> @@ -224,6 +232,8 @@<br>

>>>>      return StringRef((const char *)D, N);<br>

>>>>    }<br>

>>>> +  bool ReadValueProfilingData(const unsigned char *&D,<br>

>>>> +                              const unsigned char *const End);<br>

>>>>    data_type ReadData(StringRef K, const unsigned char *D,<br>

offset_type<br>

>>> N);<br>

>>>>  };<br>

>>>> @@ -243,6 +253,8 @@<br>

>>>>    uint64_t FormatVersion;<br>

>>>>    /// The maximal execution count among all functions.<br>

>>>>    uint64_t MaxFunctionCount;<br>

>>>> +  /// Map of hash values to const char* keys in profiling data. +<br>

std::map<uint64_t, const char *> HashKeyMap;<br>

>>>>    IndexedInstrProfReader(const IndexedInstrProfReader &) = delete;<br>

IndexedInstrProfReader &operator=(const IndexedInstrProfReader &)<br>

=<br>

>>> delete;<br>

>>>> Index: include/llvm/ProfileData/InstrProf.h<br>

>>>> ===================================================================<br>

--- include/llvm/ProfileData/InstrProf.h<br>

>>>> +++ include/llvm/ProfileData/InstrProf.h<br>

>>>> @@ -16,42 +16,102 @@<br>

>>>>  #ifndef LLVM_PROFILEDATA_INSTRPROF_H_<br>

>>>>  #define LLVM_PROFILEDATA_INSTRPROF_H_<br>

>>>> -#include "llvm/ADT/StringRef.h"<br>

>>>> +#include "llvm/ADT/StringSet.h"<br>

>>>>  #include <cstdint><br>

>>>> +#include <list><br>

>>>>  #include <system_error><br>

>>>>  #include <vector><br>

>>>>  namespace llvm {<br>

>>>>  const std::error_category &instrprof_category();<br>

>>>>  enum class instrprof_error {<br>

>>>> -    success = 0,<br>

>>>> -    eof,<br>

>>>> -    bad_magic,<br>

>>>> -    bad_header,<br>

>>>> -    unsupported_version,<br>

>>>> -    unsupported_hash_type,<br>

>>>> -    too_large,<br>

>>>> -    truncated,<br>

>>>> -    malformed,<br>

>>>> -    unknown_function,<br>

>>>> -    hash_mismatch,<br>

>>>> -    count_mismatch,<br>

>>>> -    counter_overflow<br>

>>>> +  success = 0,<br>

>>>> +  eof,<br>

>>>> +  bad_magic,<br>

>>>> +  bad_header,<br>

>>>> +  unsupported_version,<br>

>>>> +  unsupported_hash_type,<br>

>>>> +  too_large,<br>

>>>> +  truncated,<br>

>>>> +  malformed,<br>

>>>> +  unknown_function,<br>

>>>> +  hash_mismatch,<br>

>>>> +  count_mismatch,<br>

>>>> +  counter_overflow,<br>

>>>> +  value_site_count_mismatch<br>

>>>>  };<br>

>>>>  inline std::error_code make_error_code(instrprof_error E) {<br>

>>>>    return std::error_code(static_cast<int>(E), instrprof_category());<br>

>>>>  }<br>

>>>> +enum instrprof_value_kind : uint32_t {<br>

>>>> +  first = 0,<br>

>>>> +  indirect_call_target = 0,<br>

>>>> +  size = 1<br>

>>>> +};<br>

>>>> +<br>

>>>> +struct InstrProfStringTable {<br>

>>>> +  // Set of string values in profiling data.<br>

>>>> +  StringSet<> StringValueSet;<br>

>>>> +  InstrProfStringTable() { StringValueSet.clear(); }<br>

>>>> +  // Get a pointer to internal storage of a string in set<br>

>>>> +  const char *getStringData(StringRef Str) {<br>

>>>> +    auto Result = StringValueSet.find(Str);<br>

>>>> +    return (Result == StringValueSet.end()) ? nullptr :<br>

>>> Result->first().data();<br>

>>>> +  }<br>

>>>> +  // Insert a string to StringTable<br>

>>>> +  const char *insertString(StringRef Str) {<br>

>>>> +    auto Result = StringValueSet.insert(Str);<br>

>>>> +    return Result.first->first().data();<br>

>>>> +  }<br>

>>>> +};<br>

>>>> +<br>

>>>> +struct InstrProfValueSiteRecord {<br>

>>>> +  // Typedef for a single TargetValue-NumTaken pair.<br>

>>>> +  typedef std::pair<uint64_t, uint64_t> ValueDataPair;<br>

>>>> +  // Value profiling data pairs at a given value site.<br>

>>>> +  std::list<ValueDataPair> ValueData;<br>

>>>> +<br>

>>>> +  InstrProfValueSiteRecord() { ValueData.clear(); }<br>

>>>> +<br>

>>>> +  // Sort ValueData ascending by TargetValue<br>

>>>> +  void sortByTargetValues() {<br>

>>>> +    ValueData.sort([](const ValueDataPair &left, const ValueDataPair<br>

>>> &right) {<br>

>>>> +      return left.first < right.first;<br>

>>>> +    });<br>

>>>> +  }<br>

>>>> +  // Merge data from another InstrProfValueSiteRecord<br>

>>>> +  void mergeValueData(InstrProfValueSiteRecord &Input) {<br>

>>>> +    this->sortByTargetValues();<br>

>>>> +    Input.sortByTargetValues();<br>

>>>> +    auto I = ValueData.begin();<br>

>>>> +    auto IE = ValueData.end();<br>

>>>> +    for (auto J = Input.ValueData.begin(), JE =<br>

>>>> Input.ValueData.end();<br>

>>> J != JE;<br>

>>>> +         ++J) {<br>

>>>> +      while (I != IE && I->first < J->first)<br>

>>>> +        ++I;<br>

>>>> +      if (I != IE && I->first == J->first) {<br>

>>>> +        I->second += J->second;<br>

>>>> +        ++I;<br>

>>>> +        continue;<br>

>>>> +      }<br>

>>>> +      ValueData.insert(I, *J);<br>

>>>> +    }<br>

>>>> +  }<br>

>>>> +};<br>

>>>> +<br>

>>>>  /// Profiling information for a single function.<br>

>>>>  struct InstrProfRecord {<br>

>>>>    InstrProfRecord() {}<br>

>>>>    InstrProfRecord(StringRef Name, uint64_t Hash,<br>

>>>> std::vector<uint64_t><br>

>>> Counts)<br>

>>>>        : Name(Name), Hash(Hash), Counts(std::move(Counts)) {}<br>

>>>>    StringRef Name;<br>

>>>>    uint64_t Hash;<br>

>>>>    std::vector<uint64_t> Counts;<br>

>>>> +  // Size of vector indicates the number of value sites for a value<br>

>>> kind<br>

>>>> +  std::vector<InstrProfValueSiteRecord><br>

>>> ValueSites[instrprof_value_kind::size];<br>

>>>>  };<br>

>>>>  } // end namespace llvm<br>

<br>

<br>

<br>

<br>

_______________________________________________<br>

llvm-commits mailing list<br>

<a href="mailto:llvm-commits@lists.llvm.org">llvm-commits@lists.llvm.org</a><br>

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits</a><br>

</div></div></blockquote></div><br></div></div>