[PATCH] D10674: Value profiling - patchset 3
Xinliang David Li via llvm-commits
llvm-commits at lists.llvm.org
Fri Sep 11 09:58:12 PDT 2015
On Fri, Sep 11, 2015 at 9:25 AM, Betul Buyukkurt via llvm-commits <
llvm-commits at lists.llvm.org> wrote:
> Hi David,
>
> Thanks for the comments.
>
> > Regarding the StringTable usage.
> > 1) Reader: I am not sure why you need the stringTable/StringSet for the
> reader in the first place. For HashKeyMap, can you directly map the hash
> val to the key string without doing the converting (by
> > inserting the key to the stringtab)? Is it used for string
> > merging/uniquing? Or the client of the ProfileReader needs to traverse
> the set of strings some how?
>
> Value data for indirect call targets are char*'s to the string table
> entries. This is very helpful because the StringRef is not expected to be
> terminated with a null character always. Internally it stores a char* and
> a length. So the null ending is not always maintained. Thus, the strings
> containing the names in the raw profile file or the indexed format are
> also _not_ expected to be null terminated. Therefore, as far as the raw
> reader is concerned (because I tested it) creating an entry in the string
> table first and making sure the value data points to it helped value data
> maintain proper C strings with accurate contents. Having char*'s in value
> profile data helps when writing (in llvm-profdata) or forming other
> strings (in clang) by just casting the value data as a char pointer. I've
> not tested explicitly the case of the keys in OnDiskHashTable, but as far
> as I can tell from the code below:
>
> // L.first is the length of the Key.
> InfoObj->ReadKey(LocalPtr, L.first);
>
> I can see that the internal representation does not depend on null
> termination of strings.
>
> > 2) Writer: the string reference update is needed because the string
> memory in the reader will be reclaimed. However there might be a better
> way to do so:
> > With this:
> > auto &ProfileDataMap = FunctionData[I.Name];
> > I.Name string gets duplicated in the StringMap as the new key. In other
> words, it is possible to do the update the string refs in
> > InstrProfRecords (to point to the keys in stringmap) after all records
> for one reader are added.
>
> Then you have to make sure to inform the writer explicitly that one reader
> has reached its end and this introduces another writer API which is user
> level complication.
>
> > However I think we should save the trouble -- see below.
> >>> The way the HashKeyMap works here seems a little off. Why does the
> Reader own it? Only the trait uses it. Also, since it follows a strict
> "fill then query" pattern, it's probably better to just use a vector
> that we sort after filling and then binary search later.
> >> OnDiskIterableChainedHashTable is a member of the reader class. So the
> reader may conveniently iterate over its keys to collect the strings
> and
> >> form the hash table accordingly. On the other hand the LookUpTrait
> class
> >> seemed to me like a class helping after the appropriate key is located.
> Also, the Info helper class is described as "reading values from the
> hash
> >> table's payload and computes the hash for a given key".
> > Please refer to the thread about reducing profile data size. With that
> implemented, the strings in the profile data will be gone. In other
> words, the value profiler (indirect call) is responsible to
> > re-establish the hash to string mapping, not the Indexed profiler reader
> itself. This means all the translation code in this patch (from hash
> --> string in the reader, and string-->hash in the writer) should be
> removed -- there will be no special handling of icall value profile kind
> either.
>
> I'm aware of the changes coming and have been thinking how the optimizer
> should adapt itself to having md5 hashes as incoming data. It needs to
> collect all the functions in the module and construct the md5 hashes from
> their names to make sure the proper match is detected before any further
> analysis or transformation. In that regard, the function name that clang
> will generate the hash from is at most important to make your approach
> viable. The hash should be re-construct-able in LLVM and possibly in LTO
> mode. This should also happen w/o needing to maintain debug info or
> passing -g on the command line.
>
>
yes.
> Clang uses filename to prefix the mangled function names for internal and
> some other linkage types.
Only local linkage funcs.
> I'm not sure the filename prefix would be easily
>
The easy way to do that is to directly change the internal function name
itself with the path augmented name under LTO mode.
David
> re-construct-able in LLVM under LTO mode. By the way, haven't had a chance
> to look at your patches yet. Just commenting based my reads on the
> llvm-dev thread.
>
> -Betul
>
> > David
> >>>> + // Emit warning if a hash collision is detected.
> >>>> + if (Result.second == false)
> >>>> + DEBUG(dbgs() << "IndexedInstrProfReader: hash collision
> detected:
> >>> \n"
> >>>> + << "\t Map Entry(Hash, Key): " <<
> >>> Result.first->first
> >>>> + << ", " << Result.first->second << "\n"
> >>>> + << "\t New Entry(Hash, Key): " <<
> >>> ComputeHash(HashType, Key)
> >>>> + << ", " << Key << "\n");
> >>> This is not "emitting a warning". This will only be printed if the
> host
> >>> compiler is built in debug mode, so it seems pretty pointless.
> Actually
> >>> emitting a proper warning from this point in the compiler might be
> kind
> >>> of tricky though.
> >>>> + }
> >>>> // Set up our iterator for readNextRecord.
> >>>> RecordIterator = Index->data_begin();
> >>>> Index: lib/ProfileData/InstrProfIndexed.h
> >>>> ===================================================================
> --- lib/ProfileData/InstrProfIndexed.h
> >>>> +++ lib/ProfileData/InstrProfIndexed.h
> >>>> @@ -47,7 +47,7 @@
> >>>> }
> >>>> const uint64_t Magic = 0x8169666f72706cff; // "\xfflprofi\x81"
> >>>> -const uint64_t Version = 2;
> >>>> +const uint64_t Version = 3;
> >>>> const HashT HashType = HashT::MD5;
> >>>> }
> >>>> Index: lib/ProfileData/InstrProf.cpp
> >>>> ===================================================================
> --- lib/ProfileData/InstrProf.cpp
> >>>> +++ lib/ProfileData/InstrProf.cpp
> >>>> @@ -50,6 +50,8 @@
> >>>> return "Function count mismatch";
> >>>> case instrprof_error::counter_overflow:
> >>>> return "Counter overflow";
> >>>> + case instrprof_error::value_site_count_mismatch:
> >>>> + return "Function's value site counts mismatch";
> >>>> }
> >>>> llvm_unreachable("A value of instrprof_error has no message.");
> >>>> }
> >>>> Index: include/llvm/ProfileData/InstrProfWriter.h
> >>>> ===================================================================
> --- include/llvm/ProfileData/InstrProfWriter.h
> >>>> +++ include/llvm/ProfileData/InstrProfWriter.h
> >>>> @@ -15,33 +15,32 @@
> >>>> #ifndef LLVM_PROFILEDATA_INSTRPROFWRITER_H
> >>>> #define LLVM_PROFILEDATA_INSTRPROFWRITER_H
> >>>> -#include "llvm/ADT/ArrayRef.h"
> >>>> #include "llvm/ADT/DenseMap.h"
> >>>> -#include "llvm/ADT/StringMap.h"
> >>>> #include "llvm/ProfileData/InstrProf.h"
> >>>> #include "llvm/Support/DataTypes.h"
> >>>> #include "llvm/Support/MemoryBuffer.h"
> >>>> #include "llvm/Support/raw_ostream.h"
> >>>> -#include <vector>
> >>>> namespace llvm {
> >>>> /// Writer for instrumentation based profile data.
> >>>> class InstrProfWriter {
> >>>> public:
> >>>> - typedef SmallDenseMap<uint64_t, std::vector<uint64_t>, 1>
> >>> CounterData;
> >>>> + typedef SmallDenseMap<uint64_t, InstrProfRecord, 1> ProfilingData;
> +
> >>>> private:
> >>>> - StringMap<CounterData> FunctionData;
> >>>> + InstrProfStringTable StringTable;
> >>>> + StringMap<ProfilingData> FunctionData;
> >>>> uint64_t MaxFunctionCount;
> >>>> public:
> >>>> InstrProfWriter() : MaxFunctionCount(0) {}
> >>>> + /// Update string entries in profile data with references to
> >>> StringTable.
> >>>> + void updateStringTableReferences(InstrProfRecord &I);
> >>>> /// Add function counts for the given function. If there are
> >>>> already
> >>> counts
> >>>> /// for this function and the hash and number of counts match,
> each
> >>> counter is
> >>>> /// summed.
> >>>> - std::error_code addFunctionCounts(StringRef FunctionName,
> >>>> - uint64_t FunctionHash,
> >>>> - ArrayRef<uint64_t> Counters); +
> std::error_code addRecord(InstrProfRecord &&I);
> >>>> /// Write the profile to \c OS
> >>>> void write(raw_fd_ostream &OS);
> >>>> /// Write the profile, returning the raw data. For testing.
> >>>> Index: include/llvm/ProfileData/InstrProfReader.h
> >>>> ===================================================================
> --- include/llvm/ProfileData/InstrProfReader.h
> >>>> +++ include/llvm/ProfileData/InstrProfReader.h
> >>>> @@ -24,6 +24,7 @@
> >>>> #include "llvm/Support/MemoryBuffer.h"
> >>>> #include "llvm/Support/OnDiskHashTable.h"
> >>>> #include <iterator>
> >>>> +#include <map>
> >>>> namespace llvm {
> >>>> @@ -65,6 +66,9 @@
> >>>> InstrProfIterator end() { return InstrProfIterator(); }
> >>>> protected:
> >>>> + /// String table for holding a unique copy of all the strings in
> the
> >>> profile.
> >>>> + InstrProfStringTable StringTable;
> >>>> +
> >>>> /// Set the current std::error_code and return same.
> >>>> std::error_code error(std::error_code EC) {
> >>>> LastError = EC;
> >>>> @@ -195,10 +199,13 @@
> >>>> std::vector<InstrProfRecord> DataBuffer;
> >>>> IndexedInstrProf::HashT HashType;
> >>>> unsigned FormatVersion;
> >>>> + const std::map<uint64_t, const char *> &HashKeyMap;
> >>>> public:
> >>>> - InstrProfLookupTrait(IndexedInstrProf::HashT HashType, unsigned
> >>> FormatVersion)
> >>>> - : HashType(HashType), FormatVersion(FormatVersion) {}
> >>>> + InstrProfLookupTrait(IndexedInstrProf::HashT HashType, unsigned
> >>> FormatVersion,
> >>>> + std::map<uint64_t, const char *> &HashKeyMap)
> + : HashType(HashType), FormatVersion(FormatVersion),
> >>>> + HashKeyMap(HashKeyMap) {}
> >>>> typedef ArrayRef<InstrProfRecord> data_type;
> >>>> @@ -209,6 +216,7 @@
> >>>> static bool EqualKey(StringRef A, StringRef B) { return A == B; }
> static StringRef GetInternalKey(StringRef K) { return K; }
> >>>> + static StringRef GetExternalKey(StringRef K) { return K; }
> >>> What do you need GetExternalKey for?
> >>>> hash_value_type ComputeHash(StringRef K);
> >>>> @@ -224,6 +232,8 @@
> >>>> return StringRef((const char *)D, N);
> >>>> }
> >>>> + bool ReadValueProfilingData(const unsigned char *&D,
> >>>> + const unsigned char *const End);
> >>>> data_type ReadData(StringRef K, const unsigned char *D,
> offset_type
> >>> N);
> >>>> };
> >>>> @@ -243,6 +253,8 @@
> >>>> uint64_t FormatVersion;
> >>>> /// The maximal execution count among all functions.
> >>>> uint64_t MaxFunctionCount;
> >>>> + /// Map of hash values to const char* keys in profiling data. +
> std::map<uint64_t, const char *> HashKeyMap;
> >>>> IndexedInstrProfReader(const IndexedInstrProfReader &) = delete;
> IndexedInstrProfReader &operator=(const IndexedInstrProfReader &)
> =
> >>> delete;
> >>>> Index: include/llvm/ProfileData/InstrProf.h
> >>>> ===================================================================
> --- include/llvm/ProfileData/InstrProf.h
> >>>> +++ include/llvm/ProfileData/InstrProf.h
> >>>> @@ -16,42 +16,102 @@
> >>>> #ifndef LLVM_PROFILEDATA_INSTRPROF_H_
> >>>> #define LLVM_PROFILEDATA_INSTRPROF_H_
> >>>> -#include "llvm/ADT/StringRef.h"
> >>>> +#include "llvm/ADT/StringSet.h"
> >>>> #include <cstdint>
> >>>> +#include <list>
> >>>> #include <system_error>
> >>>> #include <vector>
> >>>> namespace llvm {
> >>>> const std::error_category &instrprof_category();
> >>>> enum class instrprof_error {
> >>>> - success = 0,
> >>>> - eof,
> >>>> - bad_magic,
> >>>> - bad_header,
> >>>> - unsupported_version,
> >>>> - unsupported_hash_type,
> >>>> - too_large,
> >>>> - truncated,
> >>>> - malformed,
> >>>> - unknown_function,
> >>>> - hash_mismatch,
> >>>> - count_mismatch,
> >>>> - counter_overflow
> >>>> + success = 0,
> >>>> + eof,
> >>>> + bad_magic,
> >>>> + bad_header,
> >>>> + unsupported_version,
> >>>> + unsupported_hash_type,
> >>>> + too_large,
> >>>> + truncated,
> >>>> + malformed,
> >>>> + unknown_function,
> >>>> + hash_mismatch,
> >>>> + count_mismatch,
> >>>> + counter_overflow,
> >>>> + value_site_count_mismatch
> >>>> };
> >>>> inline std::error_code make_error_code(instrprof_error E) {
> >>>> return std::error_code(static_cast<int>(E), instrprof_category());
> >>>> }
> >>>> +enum instrprof_value_kind : uint32_t {
> >>>> + first = 0,
> >>>> + indirect_call_target = 0,
> >>>> + size = 1
> >>>> +};
> >>>> +
> >>>> +struct InstrProfStringTable {
> >>>> + // Set of string values in profiling data.
> >>>> + StringSet<> StringValueSet;
> >>>> + InstrProfStringTable() { StringValueSet.clear(); }
> >>>> + // Get a pointer to internal storage of a string in set
> >>>> + const char *getStringData(StringRef Str) {
> >>>> + auto Result = StringValueSet.find(Str);
> >>>> + return (Result == StringValueSet.end()) ? nullptr :
> >>> Result->first().data();
> >>>> + }
> >>>> + // Insert a string to StringTable
> >>>> + const char *insertString(StringRef Str) {
> >>>> + auto Result = StringValueSet.insert(Str);
> >>>> + return Result.first->first().data();
> >>>> + }
> >>>> +};
> >>>> +
> >>>> +struct InstrProfValueSiteRecord {
> >>>> + // Typedef for a single TargetValue-NumTaken pair.
> >>>> + typedef std::pair<uint64_t, uint64_t> ValueDataPair;
> >>>> + // Value profiling data pairs at a given value site.
> >>>> + std::list<ValueDataPair> ValueData;
> >>>> +
> >>>> + InstrProfValueSiteRecord() { ValueData.clear(); }
> >>>> +
> >>>> + // Sort ValueData ascending by TargetValue
> >>>> + void sortByTargetValues() {
> >>>> + ValueData.sort([](const ValueDataPair &left, const ValueDataPair
> >>> &right) {
> >>>> + return left.first < right.first;
> >>>> + });
> >>>> + }
> >>>> + // Merge data from another InstrProfValueSiteRecord
> >>>> + void mergeValueData(InstrProfValueSiteRecord &Input) {
> >>>> + this->sortByTargetValues();
> >>>> + Input.sortByTargetValues();
> >>>> + auto I = ValueData.begin();
> >>>> + auto IE = ValueData.end();
> >>>> + for (auto J = Input.ValueData.begin(), JE =
> >>>> Input.ValueData.end();
> >>> J != JE;
> >>>> + ++J) {
> >>>> + while (I != IE && I->first < J->first)
> >>>> + ++I;
> >>>> + if (I != IE && I->first == J->first) {
> >>>> + I->second += J->second;
> >>>> + ++I;
> >>>> + continue;
> >>>> + }
> >>>> + ValueData.insert(I, *J);
> >>>> + }
> >>>> + }
> >>>> +};
> >>>> +
> >>>> /// Profiling information for a single function.
> >>>> struct InstrProfRecord {
> >>>> InstrProfRecord() {}
> >>>> InstrProfRecord(StringRef Name, uint64_t Hash,
> >>>> std::vector<uint64_t>
> >>> Counts)
> >>>> : Name(Name), Hash(Hash), Counts(std::move(Counts)) {}
> >>>> StringRef Name;
> >>>> uint64_t Hash;
> >>>> std::vector<uint64_t> Counts;
> >>>> + // Size of vector indicates the number of value sites for a value
> >>> kind
> >>>> + std::vector<InstrProfValueSiteRecord>
> >>> ValueSites[instrprof_value_kind::size];
> >>>> };
> >>>> } // end namespace llvm
>
>
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150911/a9f7dffc/attachment.html>
More information about the llvm-commits
mailing list