[PATCH] D13308: Value profiling - remaining LLVM changes

Wed Oct 14 21:06:27 PDT 2015

> char.
>
> This is a pointer to the uint32_t ValueDataCounts array returned by the
> __llvm_profile_gather_value_data function in InstrProfiling.c of
> compiler-rt. So changing it to any other type indicates changes to the
> runtime.
>

This of course requires runtime change (also pending). Is it just a
matter making the DataCountArray to be uint8_t** type for the gather
interface?

> A call to __llvm_profile_instrument_target at runtime, causes first the
> __llvm_profile_value_node **ValueCounters field of the __llvm_profile_data
> checked for null. If it's null, we first create an array of
> __llvm_profile_value_node* of size sum of all elements of NumValueSites.
> This forms a row of head pointers for our linked lists i.e. one linked
> list per value site.

yes, but this has nothing to do with the  type selection for the array
that records the number of target values for each site. My point is
that the number of targets per site can be limited to <=255.

>
> Then the __llvm_profile_instrument_target call, finds the head for the
> supplied site index. Walks the linked list for the value site, and if it
> can not locate the target value, appends a new node to the linked list.

yes -- no problem with this part.

>
> This means we maintain as many linked lists as there are value sites at
> run-time. Once the execution hits at exit, we iterate through the
> __llvm_profile_data elements to turn all the value profiling data into two
> arrays.

yes -- this part is also clear. It is just that one of the arrays
(DataCountArray) does not need to use 32bit type as its element.

>
> First array is a uint32_t array that shows the number of
> target_value/numtaken pairs present per value site. Its size is equivalent
> to the sum of all elements of NumValueSites.

yes, the number of pairs is small.

>The second array is the
> __llvm_profile_value_data array that is formed by appending all the data
> fields of the linked lists together. These two arrays are what gets dumped
> into the raw profile data file. Therefore, the value profile data is in
> the same order as the __llvm_profile_data in file.
>
> Raw profile reader uses the NumValueSites array to decide how many
> elements to read from the value data counts array (i.e. the first array
> described in the paragraph before). Using the counts stored in the value
> data counts array, the reader decides how many elements to read per value
> site from the second array.
>
>>>> ================
>>>> Comment at: lib/ProfileData/InstrProfReader.cpp:235
>>>> @@ -234,1 +234,3 @@
>>>>    auto NamesSize = swap(Header.NamesSize);
>>>> +  auto PaddingSize = sizeof(uint64_t) - NamesSize % sizeof(uint64_t);
> +  auto ValueDataCountsSize = swap(Header.ValueDataCountsSize);
> ----------------
>>>> Please draw a diagram describing raw data layout here:
>>>> Header:
>>>>   ....
>>>> prf_data_section:
>>>>    ....
>>>> prf_cnts_section:
>>>>    ...
>>>> prf_name_section:
>>>>     ...
>>>> padding1
>>>> vdata_counts_section: // # of counts per site
>>>>   ...
>>>> padding2
>>>> vdata_value_section:
>>>>    ...
>>>> ================
>>>> Comment at: lib/ProfileData/InstrProfReader.cpp:333
>>>> @@ +332,3 @@
>>>> +      InstrProfValueSiteRecord NewVRecord;
>>>> +      for (uint32_t VIndex = 0; VIndex < CurrentVDataCount; ++VIndex) {
>>>> +        uint64_t TargetValue = swap(CurrentVData->Value);
>>>> ----------------
>>>> Should CurrentVData be set to Data->Values?
>>>> Current implementation seems to assume that VData and VDataCounts are
> laid
>>>> out in the same order as the prf_data wrt the owning functions. I
> don't
>>>> think this assumption is right.
>>> The value profile data appears as linked lists hanging from the per
> function profile data structs at runtime. Therefore, an extra step is
> performed to turn the value profile data lists into a consecutive array
> form. We do that by iterating over the per function profile data
> structs.
>>> This as an artifact caused the value profile data and the count arrays to
>>> be laid out in the same order with the per function profile data.
>> I think it is better to not depend on this implementation detail at
> runtime.  The current implementation also depends on the fact that raw
> profile data is accessed sequentially.  Hidden assumptions like this are
> not great. It also make the access of value profile data for a given
> function is different from edge profile counters.
>
> The __llvm_profile_data array in the raw profile file can only be accessed
> sequentially. However, I do agree that the alignment of value profile data
> to _llvm_profile_data is an implementation artifact and is not uniform wrt
> how the name/counter arrays are being access currently.
>
>> The problem can be easily solved: there are two fields in ProfileData
> structure that can be used for this purpose:
>> 1) the functionPointer field which is currently not used can be used to
> store the PerSiteCountsPtr for that function
>
> Function pointer is currently used to find out the name of the target
> function by comparing it against the target values returned from profiling
> indirect call sites. We may avoid having the function pointer iff
> llvm-profdata is taught to accept the instrumented binary i.e. used to
> generate the profile data.

Yes -- but the point is that you don't need to postpone the
translation from function ptr to name ptr till profile read time. It
can be easily done during profile dumping time using qsort and
bsearch. After that is done, the function ptr field in ProfileData is
free of use.

>
>> 2) the Values field can be used to store ValuePtr for the function In
> the header, add PerSiteCountsDelta and ValueDelta field.
>
> Sure, currently the Values field is not processed by the reader. It may be
> used to store the ValuePtr, however unless llvm-profdata is taught to read
> in the instrumented binary, we need an extra field in the
> __llvm_profile_data struct for the PerSiteCountsPtr.

Note that reading binary does not work for shared libraries where
runtime addresses are different.

David

>
> Thanks,
> -Betul
>
>> With those,  Value data for a given function can be obtained similarly
> to getCounter
>> David
>>> Raw profile file format is as follows:
>>> Header: Added in two uint64_t elements for value data counts array size
> and value data array size
>>> Data array : Array may end at a uint32_t boundary when generated on
> uint32_t targets. I'll have to fix the current CL to maintain data
> sections ending at uint64_t boundaries. Current CL does not have a
> padding
>>> after the data array.
>>> Counters array : same as before
>>> Name array : same as before
>>> Padding to end the name array at uint64_t boundary : same as before
> Value data counts array: Array of uint32_t values, each value indicates
> how many uint64_t data pairs {TargetValue, NumTaken} to read from the
> value data array per value site.
>>> Padding to end the value data counts array at uint64_t boundary Value
> data array: Array of uint64_t
>>> -Betul
>>>> http://reviews.llvm.org/D13308
>
>
>
>