<div dir="ltr">Yes -- that is a valid use case and valid concern.<div><br></div><div>There is a solution to that. Instead of splitting value counter array and value site count array, they can live in one array -- each function's value profile counters are preceded by the array of site map/count array whose size is known per function. </div><div><br></div><div>To further reduce size requirement, please also make the element type of NumValueSites[] array to be uint16_t. It is not a problem for now using 32bit, but can be a big problem in the future when there are more value kinds supported. Most of the cases, the number of sites (e.g, icall) per function is less than hundreds. We can choose not to support giant functions with > 64k icallsites -- the user can always break up the function.</div><div><br></div><div>The following is a summary of my feedbacks:</div><div><br></div><div>1) make the sitecount arrray element type uint8_t;</div><div>2) embed site count array with value counter array (with small internal padding if needed) </div><div>3) make the reader reads the value profile raw data the same way for each function as edge profile -- basically using counter delta and the relocation scheme -- not the sequential read with implicit order assumption.</div><div>4) shrink the size of NumValueSites element to be 16bit long</div><div><br></div><div>thanks,</div><div><br></div><div>David</div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Oct 21, 2015 at 10:10 AM, Betul Buyukkurt <span dir="ltr"><<a href="mailto:betulb@codeaurora.org" target="_blank">betulb@codeaurora.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">>> char.<br>
>><br>
>> This is a pointer to the uint32_t ValueDataCounts array returned by the<br>
>> __llvm_profile_gather_value_data function in InstrProfiling.c of<br>
>> compiler-rt. So changing it to any other type indicates changes to the<br>
>> runtime.<br>
>><br>
><br>
> This of course requires runtime change (also pending). Is it just a<br>
> matter making the DataCountArray to be uint8_t** type for the gather<br>
> interface?<br>
><br>
>> A call to __llvm_profile_instrument_target at runtime, causes first the<br>
>> __llvm_profile_value_node **ValueCounters field of the<br>
>> __llvm_profile_data<br>
>> checked for null. If it's null, we first create an array of<br>
>> __llvm_profile_value_node* of size sum of all elements of NumValueSites.<br>
>> This forms a row of head pointers for our linked lists i.e. one linked<br>
>> list per value site.<br>
><br>
> yes, but this has nothing to do with the type selection for the array<br>
> that records the number of target values for each site. My point is<br>
> that the number of targets per site can be limited to <=255.<br>
><br>
>><br>
>> Then the __llvm_profile_instrument_target call, finds the head for the<br>
>> supplied site index. Walks the linked list for the value site, and if it<br>
>> can not locate the target value, appends a new node to the linked list.<br>
><br>
> yes -- no problem with this part.<br>
><br>
>><br>
>> This means we maintain as many linked lists as there are value sites at<br>
>> run-time. Once the execution hits at exit, we iterate through the<br>
>> __llvm_profile_data elements to turn all the value profiling data into<br>
>> two<br>
>> arrays.<br>
><br>
> yes -- this part is also clear. It is just that one of the arrays<br>
> (DataCountArray) does not need to use 32bit type as its element.<br>
><br>
>><br>
>> First array is a uint32_t array that shows the number of<br>
>> target_value/numtaken pairs present per value site. Its size is<br>
>> equivalent<br>
>> to the sum of all elements of NumValueSites.<br>
><br>
> yes, the number of pairs is small.<br>
><br>
><br>
>>The second array is the<br>
>> __llvm_profile_value_data array that is formed by appending all the data<br>
>> fields of the linked lists together. These two arrays are what gets<br>
>> dumped<br>
>> into the raw profile data file. Therefore, the value profile data is in<br>
>> the same order as the __llvm_profile_data in file.<br>
>><br>
>> Raw profile reader uses the NumValueSites array to decide how many<br>
>> elements to read from the value data counts array (i.e. the first array<br>
>> described in the paragraph before). Using the counts stored in the value<br>
>> data counts array, the reader decides how many elements to read per<br>
>> value<br>
>> site from the second array.<br>
>><br>
>>>>> ================<br>
>>>>> Comment at: lib/ProfileData/InstrProfReader.cpp:235<br>
>>>>> @@ -234,1 +234,3 @@<br>
>>>>> auto NamesSize = swap(Header.NamesSize);<br>
>>>>> + auto PaddingSize = sizeof(uint64_t) - NamesSize %<br>
>>>>> sizeof(uint64_t);<br>
>> + auto ValueDataCountsSize = swap(Header.ValueDataCountsSize);<br>
>> ----------------<br>
>>>>> Please draw a diagram describing raw data layout here:<br>
>>>>> Header:<br>
>>>>> ....<br>
>>>>> prf_data_section:<br>
>>>>> ....<br>
>>>>> prf_cnts_section:<br>
>>>>> ...<br>
>>>>> prf_name_section:<br>
>>>>> ...<br>
>>>>> padding1<br>
>>>>> vdata_counts_section: // # of counts per site<br>
>>>>> ...<br>
>>>>> padding2<br>
>>>>> vdata_value_section:<br>
>>>>> ...<br>
>>>>> ================<br>
>>>>> Comment at: lib/ProfileData/InstrProfReader.cpp:333<br>
>>>>> @@ +332,3 @@<br>
>>>>> + InstrProfValueSiteRecord NewVRecord;<br>
>>>>> + for (uint32_t VIndex = 0; VIndex < CurrentVDataCount;<br>
>>>>> ++VIndex)<br>
>>>>> {<br>
>>>>> + uint64_t TargetValue = swap(CurrentVData->Value);<br>
>>>>> ----------------<br>
>>>>> Should CurrentVData be set to Data->Values?<br>
>>>>> Current implementation seems to assume that VData and VDataCounts are<br>
>> laid<br>
>>>>> out in the same order as the prf_data wrt the owning functions. I<br>
>> don't<br>
>>>>> think this assumption is right.<br>
>>>> The value profile data appears as linked lists hanging from the per<br>
>> function profile data structs at runtime. Therefore, an extra step is<br>
>> performed to turn the value profile data lists into a consecutive array<br>
>> form. We do that by iterating over the per function profile data<br>
>> structs.<br>
>>>> This as an artifact caused the value profile data and the count arrays<br>
>>>> to<br>
>>>> be laid out in the same order with the per function profile data.<br>
>>> I think it is better to not depend on this implementation detail at<br>
>> runtime. The current implementation also depends on the fact that raw<br>
>> profile data is accessed sequentially. Hidden assumptions like this are<br>
>> not great. It also make the access of value profile data for a given<br>
>> function is different from edge profile counters.<br>
>><br>
>> The __llvm_profile_data array in the raw profile file can only be<br>
>> accessed<br>
>> sequentially. However, I do agree that the alignment of value profile<br>
>> data<br>
>> to _llvm_profile_data is an implementation artifact and is not uniform<br>
>> wrt<br>
>> how the name/counter arrays are being access currently.<br>
>><br>
>>> The problem can be easily solved: there are two fields in ProfileData<br>
>> structure that can be used for this purpose:<br>
>>> 1) the functionPointer field which is currently not used can be used to<br>
>> store the PerSiteCountsPtr for that function<br>
>><br>
>> Function pointer is currently used to find out the name of the target<br>
>> function by comparing it against the target values returned from<br>
>> profiling<br>
>> indirect call sites. We may avoid having the function pointer iff<br>
>> llvm-profdata is taught to accept the instrumented binary i.e. used to<br>
>> generate the profile data.<br>
><br>
> Yes -- but the point is that you don't need to postpone the<br>
> translation from function ptr to name ptr till profile read time. It<br>
> can be easily done during profile dumping time using qsort and<br>
> bsearch. After that is done, the function ptr field in ProfileData is<br>
> free of use.<br>
><br>
>><br>
>>> 2) the Values field can be used to store ValuePtr for the function<br>
<br>
</div></div>We're working on platforms where the profiling can not depend on the<br>
presence of an atexit call to write out the profile data out to a file.<br>
For such platforms, we're creating a background thread which writes the<br>
profile data out and resets in periodic intervals. Updating the profile<br>
data structures would prevent being able to run the profiler on such<br>
platforms unless through creating a copy and then updating/writing out the<br>
copied profile data section. Can we revisit the thought of updating the<br>
profile data structs in memory before writing out?<br>
<span class="HOEnZb"><font color="#888888"><br>
-Betul<br>
</font></span><div class="HOEnZb"><div class="h5"><br>
In<br>
>> the header, add PerSiteCountsDelta and ValueDelta field.<br>
>><br>
>> Sure, currently the Values field is not processed by the reader. It may<br>
>> be<br>
>> used to store the ValuePtr, however unless llvm-profdata is taught to<br>
>> read<br>
>> in the instrumented binary, we need an extra field in the<br>
>> __llvm_profile_data struct for the PerSiteCountsPtr.<br>
><br>
> Note that reading binary does not work for shared libraries where<br>
> runtime addresses are different.<br>
><br>
> David<br>
><br>
>><br>
>> Thanks,<br>
>> -Betul<br>
>><br>
>>> With those, Value data for a given function can be obtained similarly<br>
>> to getCounter<br>
>>> David<br>
>>>> Raw profile file format is as follows:<br>
>>>> Header: Added in two uint64_t elements for value data counts array<br>
>>>> size<br>
>> and value data array size<br>
>>>> Data array : Array may end at a uint32_t boundary when generated on<br>
>> uint32_t targets. I'll have to fix the current CL to maintain data<br>
>> sections ending at uint64_t boundaries. Current CL does not have a<br>
>> padding<br>
>>>> after the data array.<br>
>>>> Counters array : same as before<br>
>>>> Name array : same as before<br>
>>>> Padding to end the name array at uint64_t boundary : same as before<br>
>> Value data counts array: Array of uint32_t values, each value indicates<br>
>> how many uint64_t data pairs {TargetValue, NumTaken} to read from the<br>
>> value data array per value site.<br>
>>>> Padding to end the value data counts array at uint64_t boundary Value<br>
>> data array: Array of uint64_t<br>
>>>> -Betul<br>
>>>>> <a href="http://reviews.llvm.org/D13308" rel="noreferrer" target="_blank">http://reviews.llvm.org/D13308</a><br>
>><br>
>><br>
>><br>
>><br>
><br>
<br>
<br>
</div></div></blockquote></div><br></div>