[LLVMdev] RFC - Improvements to PGO profile support

Tue Mar 24 12:53:26 PDT 2015

Capping also leads to other kinds of problems -- e.g., sum of incoming edge
count (callgraph) does not match the callee entry count etc.

David

On Tue, Mar 24, 2015 at 12:50 PM, Xinliang David Li <xinliangli at gmail.com>
wrote:

>
>
> On Tue, Mar 24, 2015 at 12:45 PM, Chandler Carruth <chandlerc at google.com>
> wrote:
>
>>
>> On Tue, Mar 24, 2015 at 11:46 AM, Xinliang David Li <xinliangli at gmail.com
>> > wrote:
>>
>>> On Tue, Mar 24, 2015 at 11:29 AM, Chandler Carruth <chandlerc at google.com
>>> > wrote:
>>>
>>>> Sorry I haven't responded earlier, but one point here still doesn't
>>>> make sense to me:
>>>>
>>>> On Tue, Mar 24, 2015 at 10:27 AM, Xinliang David Li <davidxl at google.com
>>>> > wrote:
>>>>
>>>>> Diego and I have discussed this according to the feedback received. We
>>>>> have revised plan for this (see Diego's last reply).  Here is a more
>>>>> detailed re-cap:
>>>>>
>>>>> 1) keep MD_prof definition as it is today; also keep using the
>>>>> frequency propagation as it is (assuming programs with irreducible
>>>>> loops are not common and not important. If it turns out to be
>>>>> otherwise, we will revisit this).
>>>>> 2) fix all problems that lead to wrong 'frequency/count' computed from
>>>>> the frequency propagation algorithm
>>>>>    2.1) relax 32bit limit
>>>>>
>>>>
>>>> I still don't understand why this is important or useful.... Maybe I'm
>>>> just missing something.
>>>>
>>>> Given the current meaning of MD_prof, it seems like the result of
>>>> limiting this to 32-bits is that the maximum relative ratio of
>>>> probabilities between two successors of a basic block with N successors is
>>>> (2 billion / N):1 -- what is the circumstance that makes this resolution
>>>> insufficient?
>>>>
>>>> It also doesn't seem *bad* per-se, I just don't see what it improves,
>>>> and it does cost memory...
>>>>
>>>
>>> right -- there is some ambiguity here -- it is needed if we were to
>>> change MD_prof's definition to represent branch count.  However, with the
>>> new plan, the removal of the limit only applies to the function entry count
>>> representation planned.
>>>
>>
>> Ah, ok, that makes more sense.
>>
>> I'm still curious, is the ratio of 2 billion : 1 insufficient between the
>> hottest basic block in the inner most loop and the entry block? My
>> intuition is that this ratio encapsulates all the information we could
>> meaningfully make decisions based upon, and I don't have any examples where
>> it falls over, but perhaps you have some examples?
>>
>
> The ratio is not the problem. The problem is that we can no longer
> effectively differentiate hot functions. 2 billion vs 4 billion will look
> the same with the small capping.
>
> David
>
>
>
>>
>> (Note, the 4096 scaling limit thing is completely separate, that makes
>> perfect sense to me.)
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150324/31f9b999/attachment.html>