<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=windows-1256">

</head>

<body>

<div>

<div style="font-family:Calibri,sans-serif; font-size:11pt">Thanks David for the insight. I was thinking whether we can use some of the learning algorithms like Support Vector Machine to estimate freqs in a non-pgo setup. And these can use the pgo-based results

 for supervised learning.<br>

<br>

Sent from my Windows Phone</div>

</div>

<div dir="ltr">

<hr>

<span style="font-family:Calibri,sans-serif; font-size:11pt; font-weight:bold">From:

</span><span style="font-family:Calibri,sans-serif; font-size:11pt"><a href="mailto:xinliangli@gmail.com">Xinliang David Li</a></span><br>

<span style="font-family:Calibri,sans-serif; font-size:11pt; font-weight:bold">Sent:

</span><span style="font-family:Calibri,sans-serif; font-size:11pt">ı12/ı27/ı2014 1:53 PM</span><br>

<span style="font-family:Calibri,sans-serif; font-size:11pt; font-weight:bold">To:

</span><span style="font-family:Calibri,sans-serif; font-size:11pt"><a href="mailto:Dibyendu.Das@amd.com">Das, Dibyendu</a></span><br>

<span style="font-family:Calibri,sans-serif; font-size:11pt; font-weight:bold">Cc:

</span><span style="font-family:Calibri,sans-serif; font-size:11pt"><a href="mailto:ibaev@codeaurora.org">ibaev@codeaurora.org</a>;

<a href="mailto:llvmdev@cs.uiuc.edu">llvmdev</a></span><br>

<span style="font-family:Calibri,sans-serif; font-size:11pt; font-weight:bold">Subject:

</span><span style="font-family:Calibri,sans-serif; font-size:11pt">Re: [LLVMdev] How to use BlockFrequency in inter-procedural context?</span><br>

<br>

</div>

<div>

<div dir="ltr">

<div class="gmail_extra"><br>

<div class="gmail_quote">On Fri, Dec 26, 2014 at 10:24 PM, Das, Dibyendu <span dir="ltr">

<<a href="mailto:Dibyendu.Das@amd.com" target="_blank">Dibyendu.Das@amd.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex; border-left:1px #ccc solid; padding-left:1ex">

<div>

<div>

<div style="font-family:Calibri,sans-serif; font-size:11pt">David<br>

<br>

Is this true for static heuristics as well ?<br>

<br>

If the bb freqs are scaled wrt to the entry block freq and a) use such scaled freqs for the bb's that have calls b) propagate this info topologically over the call graph, how representative will be the info if one just wants to use in a comparative sense ?<br>

<br>

</div>

</div>

</div>

</blockquote>

<div><br>

</div>

<div>The main issue with inter-procedural propagation of static frequencies is the existence of indirect calls -- their targets are usually not resolvable statically, not to mention their frequency distributions. Another issue is with loops -- static heuristics

 tend to estimate loop trip count very conservatively, and it favors BBs in deep loop nests more. The global 'hotness' info computed can be misleading and harmful.</div>

<div><br>

</div>

<div>David</div>

<div><br>

</div>

<div> </div>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex; border-left:1px #ccc solid; padding-left:1ex">

<div>

<div>

<div style="font-family:Calibri,sans-serif; font-size:11pt">-Dibyendu<br>

<br>

Sent from my Windows Phone</div>

</div>

<div dir="ltr">

<hr>

<span style="font-family:Calibri,sans-serif; font-size:11pt; font-weight:bold">From:

</span><span style="font-family:Calibri,sans-serif; font-size:11pt"><a href="mailto:xinliangli@gmail.com" target="_blank">Xinliang David Li</a></span><br>

<span style="font-family:Calibri,sans-serif; font-size:11pt; font-weight:bold">Sent:

</span><span style="font-family:Calibri,sans-serif; font-size:11pt">ı12/ı27/ı2014 10:05 AM</span><br>

<span style="font-family:Calibri,sans-serif; font-size:11pt; font-weight:bold">To:

</span><span style="font-family:Calibri,sans-serif; font-size:11pt"><a href="mailto:ibaev@codeaurora.org" target="_blank">ibaev@codeaurora.org</a></span><br>

<span style="font-family:Calibri,sans-serif; font-size:11pt; font-weight:bold">Cc:

</span><span style="font-family:Calibri,sans-serif; font-size:11pt"><a href="mailto:llvmdev@cs.uiuc.edu" target="_blank">llvmdev</a></span><br>

<span style="font-family:Calibri,sans-serif; font-size:11pt; font-weight:bold">Subject:

</span><span style="font-family:Calibri,sans-serif; font-size:11pt">Re: [LLVMdev] How to use BlockFrequency in inter-procedural context?</span><br>

<br>

</div>

<div>

<div class="h5">

<div>

<div dir="ltr">

<div class="gmail_extra"><br>

<div class="gmail_quote">On Fri, Dec 26, 2014 at 7:12 PM, <span dir="ltr"><<a href="mailto:ibaev@codeaurora.org" target="_blank">ibaev@codeaurora.org</a>></span> wrote:

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex; border-left:1px #ccc solid; padding-left:1ex">

The BlockFrequency analysis has been useful for machine block placement,<br>

register allocation and other function-level optimizations. How could we<br>

extend it for use in an inter-procedural (whole-program) context? For<br>

example, if we would like to compare the hotness of two call sites in<br>

different functions, or calculate the hotness of two global variables<br>

referenced in multiple functions.<br>

</blockquote>

<div><br>

</div>

<div>BlockFrequency can not be used for inter-procedural analysis. </div>

<div> </div>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex; border-left:1px #ccc solid; padding-left:1ex">

<br>

If the ratio of a block BB frequency to the entry block frequency is the<br>

expected number of times the block BB will execute per entry to the<br>

function (according to LLVM Block Frequency Terminology page), would the<br>

multiplication of that ratio to the profile count of the function be a<br>

reasonable approximation of BB total execution count?<br>

</blockquote>

<div><br>

</div>

<div>The answer depends on how BB frequency is computed. If BB frequency is directly scaled from BB profile count (Execution count), then the answer is 'yes'. In the current implementation, the answer is 'no'.  LLVM only keeps branch probability information

 from the profile data, and BB frequency is computed from branch probabilities:  There are known limitations of the frequency propagation algorithm to make the resulting frequency 'distorted'. To name a few: loop scale is capped at 4096; branch weight is incremented

 by 1 (leading to issues such as computed loop trip count as low as half of the actual count); limitation of handling irreducible loops etc. In fact, frequency can not be reliably be used for comparison across different loop nests.</div>

<div><br>

</div>

<div>We have plans to address those issues to improve PGO.</div>

<div><br>

</div>

<div>thanks,</div>

<div><br>

</div>

<div>David</div>

<div><br>

</div>

<div><br>

</div>

<div><br>

</div>

<div> </div>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex; border-left:1px #ccc solid; padding-left:1ex">

<br>

Thanks,<br>

Ivan<br>

<br>

<br>

<br>

_______________________________________________<br>

LLVM Developers mailing list<br>

<a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a>         <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>

</blockquote>

</div>

</div>

</div>

</div>

</div>

</div>

</div>

</blockquote>

</div>

<br>

</div>

</div>

</div>

</body>

</html>