[LLVMdev] IC profiling infrastructure

Mon Jun 15 21:31:22 PDT 2015

> Xinliang David Li <davidxl at google.com> writes:
>> On Mon, Jun 15, 2015 at 5:35 PM, Justin Bogner <mail at justinbogner.com>
>> wrote:
>>> "Betul Buyukkurt" <betulb at codeaurora.org> writes:
>>>>> - We don't need to store the value profiling kind in the data at all.
>>>>>   The frontend knows which invocations of the intrinsic are for each
>>>>> kind
>>>>>   implicitly, much like it knows the difference between a counter for
>>>>> an
>>>>>   "if" and a "for" apart implicitly. Just store one set of profiling
>>>>> data
>>>>>   and let the frontend sort it out.
>>>>
>>>> I think so too. However, value_kind should stay around until the kinds
>>>> of
>>>> expressions whose values are computed are included in the hash
>>>> computation. Otherwise, if a value-profiled expression is to be
>>>> removed
>>>> from source, then all the rest of the values would get attached to the
>>>> MD
>>>> fields of wrong types. Currently hash checking verifies if the counter
>>>> assigned regions in the source has been changed across profile
>>>> consumption
>>>> runs.
>>>
>>> This isn't a good reason to keep value_kind - the function hash not
>>> reflecting these kinds of changes is a problem regardless of whether or
>>> not we have multiple types. We should just add hash computations for
>>> these indirect call counters. As long as we only add this when indirect
>>> profiling is turned on it won't invalidate old hashes so it's trivial
>>> to
>>> stay backwards compatible.
>>
>> I think there are other better reasons why keeping value kind is
>> desirable. Different kinds of value profile data needs different
>> 'preprocessing' during reading. For instance, indirect call target
>> needs to translate target address into function name. For length data,
>> we may want to compute average length; for data address, we may want
>> to compute the min-alignment for each site. It will be hard to do so
>> with out value kind info.
>
> But since the consumer is the frontend, and it knows which counts are
> which, it knows the kind, no? I don't understand how storing the kind
> info helps or hurts - it's just redundant.
>
>>>
>>> The other thing we should do is store which profiling options are
>>> enabled, in both formats. When we specify profile-instr-use it'll be
>>> less error prone if we can detect whether or not this includes indirect
>>> call profiling without checking other options, and it's probably a good
>>> idea for "llvm-profdata merge" to disallow merging profiles that are
>>> gathering different sets of data. A bitfield seems suitable for this.
>>
>> For value profiling, allowing profile-gen and profile-use passes using
>> different options is a useful feature. Consider the following
>> scenarios:
>> 1) collect the same profile data once with all kinds of value profile
>>    data collected. The exact same profile data can be used in
>> performance
>>    experiments with different kinds of value profiling enabled/disabled
>
> This use case is pretty easy to accomodate for with the model where the
> profile stores its type. We could autodetect the type to decide how to
> interpret the profile, and if more specific profile flags are set simply
> ignore some of the data. The thing that the file format storing the type
> gives us is that it's harder to misinterpret the data by supplying the
> mismatching flags between reading and writing.

How would you plan to merge profile files where one profile is collected
w/ one value_kind turned on with another file where another value_kind is
turned on. This might be a valid use case. The value_kinds at this time
are not stored per raw value in the raw profile files. The newer is design
is extremely compact. Having the value_kind is extremely useful when doing
the merges and when processing (i.e. derivation of function names from
target addresses) data differently per value_kind. You may read the design
discussions in the past threads.

>> 2) work around profile-use/value transformation bug selectively for
>>    some file without the need to change anything in instrumentation pass
>
> I'm not sure I understand what you mean here.
>
>> Besides, with the latest patch, the value_kind is not recorded in each
>> profile value thus the overhead is minimized.
>
> The overhead is low, sure, but the code complexity of dealing with the
> multiple kinds in this way is quite real. Since I'm not convinced we're
> getting real benefit from storing the kind, I don't believe this
> trade-off is worth it.
>
>> thanks,
>>
>> David
>