[LLVMdev] RFC - Improvements to PGO profile support

Chandler Carruth chandlerc at google.com
Tue Mar 24 12:45:19 PDT 2015


On Tue, Mar 24, 2015 at 11:46 AM, Xinliang David Li <xinliangli at gmail.com>
wrote:

> On Tue, Mar 24, 2015 at 11:29 AM, Chandler Carruth <chandlerc at google.com>
> wrote:
>
>> Sorry I haven't responded earlier, but one point here still doesn't make
>> sense to me:
>>
>> On Tue, Mar 24, 2015 at 10:27 AM, Xinliang David Li <davidxl at google.com>
>> wrote:
>>
>>> Diego and I have discussed this according to the feedback received. We
>>> have revised plan for this (see Diego's last reply).  Here is a more
>>> detailed re-cap:
>>>
>>> 1) keep MD_prof definition as it is today; also keep using the
>>> frequency propagation as it is (assuming programs with irreducible
>>> loops are not common and not important. If it turns out to be
>>> otherwise, we will revisit this).
>>> 2) fix all problems that lead to wrong 'frequency/count' computed from
>>> the frequency propagation algorithm
>>>    2.1) relax 32bit limit
>>>
>>
>> I still don't understand why this is important or useful.... Maybe I'm
>> just missing something.
>>
>> Given the current meaning of MD_prof, it seems like the result of
>> limiting this to 32-bits is that the maximum relative ratio of
>> probabilities between two successors of a basic block with N successors is
>> (2 billion / N):1 -- what is the circumstance that makes this resolution
>> insufficient?
>>
>> It also doesn't seem *bad* per-se, I just don't see what it improves, and
>> it does cost memory...
>>
>
> right -- there is some ambiguity here -- it is needed if we were to change
> MD_prof's definition to represent branch count.  However, with the new
> plan, the removal of the limit only applies to the function entry count
> representation planned.
>

Ah, ok, that makes more sense.

I'm still curious, is the ratio of 2 billion : 1 insufficient between the
hottest basic block in the inner most loop and the entry block? My
intuition is that this ratio encapsulates all the information we could
meaningfully make decisions based upon, and I don't have any examples where
it falls over, but perhaps you have some examples?

(Note, the 4096 scaling limit thing is completely separate, that makes
perfect sense to me.)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150324/6281dea2/attachment.html>


More information about the llvm-dev mailing list