[PATCH] D13308: Value profiling - remaining LLVM changes
Xinliang David Li via llvm-commits
llvm-commits at lists.llvm.org
Wed Oct 21 12:21:49 PDT 2015
On Wed, Oct 21, 2015 at 12:20 PM, Xinliang David Li <davidxl at google.com>
wrote:
> Yes -- that is a valid use case and valid concern.
>
> There is a solution to that. Instead of splitting value counter array and
> value site count array, they can live in one array -- each function's value
> profile counters are preceded by the array of site map/count array whose
> size is known per function.
>
> To further reduce size requirement, please also make the element type of
> NumValueSites[] array to be uint16_t. It is not a problem for now using
> 32bit, but can be a big problem in the future when there are more value
> kinds supported. Most of the cases, the number of sites (e.g, icall) per
> function is less than hundreds. We can choose not to support giant
> functions with > 64k icallsites -- the user can always break up the
> function.
>
> The following is a summary of my feedbacks:
>
> 1) make the sitecount arrray element type uint8_t;
> 2) embed site count array with value counter array (with small internal
> padding if needed)
> 3) make the reader reads the value profile raw data the same way for each
> function as edge profile -- basically using counter delta and the
> relocation scheme -- not the sequential read with implicit order assumption.
> 4) shrink the size of NumValueSites element to be 16bit long
>
5) make the reader capable of of reading data with different # of recorded
value kinds.
>
> thanks,
>
> David
>
>
> On Wed, Oct 21, 2015 at 10:10 AM, Betul Buyukkurt <betulb at codeaurora.org>
> wrote:
>
>> >> char.
>> >>
>> >> This is a pointer to the uint32_t ValueDataCounts array returned by the
>> >> __llvm_profile_gather_value_data function in InstrProfiling.c of
>> >> compiler-rt. So changing it to any other type indicates changes to the
>> >> runtime.
>> >>
>> >
>> > This of course requires runtime change (also pending). Is it just a
>> > matter making the DataCountArray to be uint8_t** type for the gather
>> > interface?
>> >
>> >> A call to __llvm_profile_instrument_target at runtime, causes first the
>> >> __llvm_profile_value_node **ValueCounters field of the
>> >> __llvm_profile_data
>> >> checked for null. If it's null, we first create an array of
>> >> __llvm_profile_value_node* of size sum of all elements of
>> NumValueSites.
>> >> This forms a row of head pointers for our linked lists i.e. one linked
>> >> list per value site.
>> >
>> > yes, but this has nothing to do with the type selection for the array
>> > that records the number of target values for each site. My point is
>> > that the number of targets per site can be limited to <=255.
>> >
>> >>
>> >> Then the __llvm_profile_instrument_target call, finds the head for the
>> >> supplied site index. Walks the linked list for the value site, and if
>> it
>> >> can not locate the target value, appends a new node to the linked list.
>> >
>> > yes -- no problem with this part.
>> >
>> >>
>> >> This means we maintain as many linked lists as there are value sites at
>> >> run-time. Once the execution hits at exit, we iterate through the
>> >> __llvm_profile_data elements to turn all the value profiling data into
>> >> two
>> >> arrays.
>> >
>> > yes -- this part is also clear. It is just that one of the arrays
>> > (DataCountArray) does not need to use 32bit type as its element.
>> >
>> >>
>> >> First array is a uint32_t array that shows the number of
>> >> target_value/numtaken pairs present per value site. Its size is
>> >> equivalent
>> >> to the sum of all elements of NumValueSites.
>> >
>> > yes, the number of pairs is small.
>> >
>> >
>> >>The second array is the
>> >> __llvm_profile_value_data array that is formed by appending all the
>> data
>> >> fields of the linked lists together. These two arrays are what gets
>> >> dumped
>> >> into the raw profile data file. Therefore, the value profile data is in
>> >> the same order as the __llvm_profile_data in file.
>> >>
>> >> Raw profile reader uses the NumValueSites array to decide how many
>> >> elements to read from the value data counts array (i.e. the first array
>> >> described in the paragraph before). Using the counts stored in the
>> value
>> >> data counts array, the reader decides how many elements to read per
>> >> value
>> >> site from the second array.
>> >>
>> >>>>> ================
>> >>>>> Comment at: lib/ProfileData/InstrProfReader.cpp:235
>> >>>>> @@ -234,1 +234,3 @@
>> >>>>> auto NamesSize = swap(Header.NamesSize);
>> >>>>> + auto PaddingSize = sizeof(uint64_t) - NamesSize %
>> >>>>> sizeof(uint64_t);
>> >> + auto ValueDataCountsSize = swap(Header.ValueDataCountsSize);
>> >> ----------------
>> >>>>> Please draw a diagram describing raw data layout here:
>> >>>>> Header:
>> >>>>> ....
>> >>>>> prf_data_section:
>> >>>>> ....
>> >>>>> prf_cnts_section:
>> >>>>> ...
>> >>>>> prf_name_section:
>> >>>>> ...
>> >>>>> padding1
>> >>>>> vdata_counts_section: // # of counts per site
>> >>>>> ...
>> >>>>> padding2
>> >>>>> vdata_value_section:
>> >>>>> ...
>> >>>>> ================
>> >>>>> Comment at: lib/ProfileData/InstrProfReader.cpp:333
>> >>>>> @@ +332,3 @@
>> >>>>> + InstrProfValueSiteRecord NewVRecord;
>> >>>>> + for (uint32_t VIndex = 0; VIndex < CurrentVDataCount;
>> >>>>> ++VIndex)
>> >>>>> {
>> >>>>> + uint64_t TargetValue = swap(CurrentVData->Value);
>> >>>>> ----------------
>> >>>>> Should CurrentVData be set to Data->Values?
>> >>>>> Current implementation seems to assume that VData and VDataCounts
>> are
>> >> laid
>> >>>>> out in the same order as the prf_data wrt the owning functions. I
>> >> don't
>> >>>>> think this assumption is right.
>> >>>> The value profile data appears as linked lists hanging from the per
>> >> function profile data structs at runtime. Therefore, an extra step is
>> >> performed to turn the value profile data lists into a consecutive array
>> >> form. We do that by iterating over the per function profile data
>> >> structs.
>> >>>> This as an artifact caused the value profile data and the count
>> arrays
>> >>>> to
>> >>>> be laid out in the same order with the per function profile data.
>> >>> I think it is better to not depend on this implementation detail at
>> >> runtime. The current implementation also depends on the fact that raw
>> >> profile data is accessed sequentially. Hidden assumptions like this
>> are
>> >> not great. It also make the access of value profile data for a given
>> >> function is different from edge profile counters.
>> >>
>> >> The __llvm_profile_data array in the raw profile file can only be
>> >> accessed
>> >> sequentially. However, I do agree that the alignment of value profile
>> >> data
>> >> to _llvm_profile_data is an implementation artifact and is not uniform
>> >> wrt
>> >> how the name/counter arrays are being access currently.
>> >>
>> >>> The problem can be easily solved: there are two fields in ProfileData
>> >> structure that can be used for this purpose:
>> >>> 1) the functionPointer field which is currently not used can be used
>> to
>> >> store the PerSiteCountsPtr for that function
>> >>
>> >> Function pointer is currently used to find out the name of the target
>> >> function by comparing it against the target values returned from
>> >> profiling
>> >> indirect call sites. We may avoid having the function pointer iff
>> >> llvm-profdata is taught to accept the instrumented binary i.e. used to
>> >> generate the profile data.
>> >
>> > Yes -- but the point is that you don't need to postpone the
>> > translation from function ptr to name ptr till profile read time. It
>> > can be easily done during profile dumping time using qsort and
>> > bsearch. After that is done, the function ptr field in ProfileData is
>> > free of use.
>> >
>> >>
>> >>> 2) the Values field can be used to store ValuePtr for the function
>>
>> We're working on platforms where the profiling can not depend on the
>> presence of an atexit call to write out the profile data out to a file.
>> For such platforms, we're creating a background thread which writes the
>> profile data out and resets in periodic intervals. Updating the profile
>> data structures would prevent being able to run the profiler on such
>> platforms unless through creating a copy and then updating/writing out the
>> copied profile data section. Can we revisit the thought of updating the
>> profile data structs in memory before writing out?
>>
>> -Betul
>>
>> In
>> >> the header, add PerSiteCountsDelta and ValueDelta field.
>> >>
>> >> Sure, currently the Values field is not processed by the reader. It may
>> >> be
>> >> used to store the ValuePtr, however unless llvm-profdata is taught to
>> >> read
>> >> in the instrumented binary, we need an extra field in the
>> >> __llvm_profile_data struct for the PerSiteCountsPtr.
>> >
>> > Note that reading binary does not work for shared libraries where
>> > runtime addresses are different.
>> >
>> > David
>> >
>> >>
>> >> Thanks,
>> >> -Betul
>> >>
>> >>> With those, Value data for a given function can be obtained similarly
>> >> to getCounter
>> >>> David
>> >>>> Raw profile file format is as follows:
>> >>>> Header: Added in two uint64_t elements for value data counts array
>> >>>> size
>> >> and value data array size
>> >>>> Data array : Array may end at a uint32_t boundary when generated on
>> >> uint32_t targets. I'll have to fix the current CL to maintain data
>> >> sections ending at uint64_t boundaries. Current CL does not have a
>> >> padding
>> >>>> after the data array.
>> >>>> Counters array : same as before
>> >>>> Name array : same as before
>> >>>> Padding to end the name array at uint64_t boundary : same as before
>> >> Value data counts array: Array of uint32_t values, each value indicates
>> >> how many uint64_t data pairs {TargetValue, NumTaken} to read from the
>> >> value data array per value site.
>> >>>> Padding to end the value data counts array at uint64_t boundary Value
>> >> data array: Array of uint64_t
>> >>>> -Betul
>> >>>>> http://reviews.llvm.org/D13308
>> >>
>> >>
>> >>
>> >>
>> >
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20151021/c296be24/attachment.html>
More information about the llvm-commits
mailing list