[PATCH] D10674: Value profiling - patchset 3

Fri Sep 11 17:51:33 PDT 2015

"Betul Buyukkurt" <betulb at codeaurora.org> writes:
>>>>> I think, I'm leaning towards keeping an array of kinds here.
>>>> Why?
>>> The reason is that adding new value kinds will happen in close to no-time
>>> once this merges in. David has also listed couple types of value profiling
>>> he was interested in seeing as well as I also iterated that in the
>>> previous emails. This is in the end an implementation detail. I do not
>>> think this is a decision that affects the file format or reader/writer
>>> design overall.
>>
>> Adding new value kinds will take next to no-time with vectors per-kind
>> as well. Even better, this way we'll have switch statements in all of the
>> places where these are manipulated, which means that if someone forgets to
>> update one of the locations where this is used there will be compiler
>> warnings for uncovered switch statements.
>>
>> As it is now, the use sites for this type will do something arbitrary
>> which may or may not be reasonable for the new value type. You won't be
>> able to notice that you missed updating that spot until it behaves
>> incorrectly at runtime. This isn't really that big a deal, but designing
>> it to be harder to make mistakes for future people who work on this is
>> valuable.
>
> Do you take into account that once md5 changes are merged in all of the
> string table and hashkey maps are going to be removed as part of re-basing
> code to the new implementation? Or if that merges in first, I'll have to
> rebase by removing them from my implementation. 

The MD5 changes aren't relevant to whether we keep the different value
kinds as an array or multiple named values.

> Also, in addition to the few syntactic issues you mentioned is this
> the only stopping factor at this time?

All of my remaining issues with the patch have been sent out, yes.

>>>> I'm a little uncomfortable that we need to modify the Name in the
>>>> iterators here. Can't the data structure backing the string table just
>>>> work in StringRefs instead?
>>>
>>> StringTable implementation is based on LLVM's StringSet<>. Here we're
>>> returning the char* to the internal representation of the string in the
>>> table. This helps printing the values as a string in llvm-profdata and
>>> helps in assigning the metadata entries as string in clang. Imagine it as
>>> what's stored is in essence the pointer value returned by data() member of
>>> StringRef class.
>>
>> Right, I see how it works, but why are we duplicating all of these
>> strings into a StringSet? The strings already have storage with a
>> sufficient lifetime, no? The StringTable could just be a
>> DenseSet<StringRef> and then the pointer values of the strings don't
>> need to be modified.
>
> I do not understand how that solves the problem. What do you suggest to
> store in the value data field for indirect call targets? If I store the
> char* of the memory pointed to by the StringRef, then printing the value
> data as a C string would go beyond the length in the StringRef since these
> strings are not null terminated. Could you please provide more clarity?

Oh, I see. You're type-punning 'char *'s through uint64_t to pass this
data around. Isn't that undefined behaviour?

Personally, I'd probably do a string table as a std::vector<StringRef>,
and use a DenseMap<StringRef, size_t> to check if strings to be inserted
were already there. This seems simpler, safer, and less confusing.

>>>> The way the HashKeyMap works here seems a little off. Why does the
>>>> Reader own it? Only the trait uses it. Also, since it follows a strict
>>>> "fill then query" pattern, it's probably better to just use a vector that
>>>> we sort after filling and then binary search later.
>>>
>>> OnDiskIterableChainedHashTable is a member of the reader class. So the
>>> reader may conveniently iterate over its keys to collect the strings and
>>> form the hash table accordingly. On the other hand the LookUpTrait class
>>> seemed to me like a class helping after the appropriate key is located.
>>> Also, the Info helper class is described as "reading values from the hash
>>> table's payload and computes the hash for a given key".
>>>
>> I don't understand this answer. The Reader has-a HashKeyMap, and the
>> trait has a reference to it. The reader *never uses the HashKeyMap* after
>> creating it. Just create it in the reader and store it in the trait.
>
> I may achieve this by having the HashKeyMap stored in the LookupTrait and
> the Reader class declaring the LookupTrait as a friend class.

Why does the LookupTrait need to be friended? Can't you fill a local
std::vector in this function, and then hand it over to the LookupTrait,
with something like a public "setHashKeyLookup" method?