[PATCH] D10674: Value profiling - patchset 3

Fri Sep 11 20:13:28 PDT 2015

On Fri, Sep 11, 2015 at 5:51 PM, Justin Bogner <mail at justinbogner.com> wrote:
> "Betul Buyukkurt" <betulb at codeaurora.org> writes:
>>>>>> I think, I'm leaning towards keeping an array of kinds here.
>>>>> Why?
>>>> The reason is that adding new value kinds will happen in close to no-time
>>>> once this merges in. David has also listed couple types of value profiling
>>>> he was interested in seeing as well as I also iterated that in the
>>>> previous emails. This is in the end an implementation detail. I do not
>>>> think this is a decision that affects the file format or reader/writer
>>>> design overall.
>>>
>>> Adding new value kinds will take next to no-time with vectors per-kind
>>> as well. Even better, this way we'll have switch statements in all of the
>>> places where these are manipulated, which means that if someone forgets to
>>> update one of the locations where this is used there will be compiler
>>> warnings for uncovered switch statements.
>>>
>>> As it is now, the use sites for this type will do something arbitrary
>>> which may or may not be reasonable for the new value type. You won't be
>>> able to notice that you missed updating that spot until it behaves
>>> incorrectly at runtime. This isn't really that big a deal, but designing
>>> it to be harder to make mistakes for future people who work on this is
>>> valuable.
>>
>> Do you take into account that once md5 changes are merged in all of the
>> string table and hashkey maps are going to be removed as part of re-basing
>> code to the new implementation? Or if that merges in first, I'll have to
>> rebase by removing them from my implementation.
>
> The MD5 changes aren't relevant to whether we keep the different value
> kinds as an array or multiple named values.
>
>> Also, in addition to the few syntactic issues you mentioned is this
>> the only stopping factor at this time?
>
> All of my remaining issues with the patch have been sent out, yes.
>
>>>>> I'm a little uncomfortable that we need to modify the Name in the
>>>>> iterators here. Can't the data structure backing the string table just
>>>>> work in StringRefs instead?
>>>>
>>>> StringTable implementation is based on LLVM's StringSet<>. Here we're
>>>> returning the char* to the internal representation of the string in the
>>>> table. This helps printing the values as a string in llvm-profdata and
>>>> helps in assigning the metadata entries as string in clang. Imagine it as
>>>> what's stored is in essence the pointer value returned by data() member of
>>>> StringRef class.
>>>
>>> Right, I see how it works, but why are we duplicating all of these
>>> strings into a StringSet? The strings already have storage with a
>>> sufficient lifetime, no? The StringTable could just be a
>>> DenseSet<StringRef> and then the pointer values of the strings don't
>>> need to be modified.
>>
>> I do not understand how that solves the problem. What do you suggest to
>> store in the value data field for indirect call targets? If I store the
>> char* of the memory pointed to by the StringRef, then printing the value
>> data as a C string would go beyond the length in the StringRef since these
>> strings are not null terminated. Could you please provide more clarity?
>
> Oh, I see. You're type-punning 'char *'s through uint64_t to pass this
> data around. Isn't that undefined behaviour?
>
> Personally, I'd probably do a string table as a std::vector<StringRef>,
> and use a DenseMap<StringRef, size_t> to check if strings to be inserted
> were already there. This seems simpler, safer, and less confusing.

StringSet is implemented using StringMap. It has a nice property that
the string key when inserted will be cloned into the map and the clone
is made null terminated. The null termination allows the raw string to
be used for efficiency.

I think  using StringSet should be fine for now. In the longer term,
as I mentioned, when name strings are removed from profile data, all
these will become moot as the string table will be eliminated.

David

>
>>>>> The way the HashKeyMap works here seems a little off. Why does the
>>>>> Reader own it? Only the trait uses it. Also, since it follows a strict
>>>>> "fill then query" pattern, it's probably better to just use a vector that
>>>>> we sort after filling and then binary search later.
>>>>
>>>> OnDiskIterableChainedHashTable is a member of the reader class. So the
>>>> reader may conveniently iterate over its keys to collect the strings and
>>>> form the hash table accordingly. On the other hand the LookUpTrait class
>>>> seemed to me like a class helping after the appropriate key is located.
>>>> Also, the Info helper class is described as "reading values from the hash
>>>> table's payload and computes the hash for a given key".
>>>>
>>> I don't understand this answer. The Reader has-a HashKeyMap, and the
>>> trait has a reference to it. The reader *never uses the HashKeyMap* after
>>> creating it. Just create it in the reader and store it in the trait.
>>
>> I may achieve this by having the HashKeyMap stored in the LookupTrait and
>> the Reader class declaring the LookupTrait as a friend class.
>
> Why does the LookupTrait need to be friended? Can't you fill a local
> std::vector in this function, and then hand it over to the LookupTrait,
> with something like a public "setHashKeyLookup" method?