<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <p><br>
    </p>
    <br>
    <div class="moz-cite-prefix">On 12/14/2017 01:44 PM, Xinliang David
      Li wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CAAkRFZ+oqLRowCP7EWb7iAYwvvoiZ2atNLLUW_LXAiZ8J8KzJw@mail.gmail.com">
      <div dir="ltr"><br>
        <div class="gmail_extra"><br>
          <div class="gmail_quote">On Thu, Dec 14, 2017 at 1:03 PM,
            Philip Reames <span dir="ltr"><<a
                href="mailto:listmail@philipreames.com" target="_blank"
                moz-do-not-send="true">listmail@philipreames.com</a>></span>
            wrote:<br>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div text="#000000" bgcolor="#FFFFFF">
                <p>This really sounds like it should be an analysis
                  pass.  Summarizing the results into the IR using the
                  existing entry_county attributes sounds like a
                  workaround for an invalidation problem more than
                  anything else.  <br>
                </p>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div>I think it is not a workaround. The strength of the
              proposal is that it fits really naturally with the PGO
              infrastructure -- all the existing profile update
              mechanism (currently mainly inliner) can be reused.  Most
              of the PGO related query interfaces can also be reused so
              that there is very little change from the client side.<br>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    Understood.  On the other hand, these could also be structured as
    analysis updates done by the transforms and the users could consume
    the analysis result.  <br>
    <blockquote type="cite"
cite="mid:CAAkRFZ+oqLRowCP7EWb7iAYwvvoiZ2atNLLUW_LXAiZ8J8KzJw@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div><br>
            </div>
            <div>The inter-procedural frequency propagation also needs
              to happen in the thinLink time for thinLTO which should
              not touch IR except summary data.</div>
          </div>
        </div>
      </div>
    </blockquote>
    Hm, this is a really good point.  I'd missed that in the original
    doc.  Given this, the design makes a lot of sense.<br>
    <br>
    Personally, I'd still lean towards trying to structure this as an
    analysis, but that's a weakly held opinion.  Don't let it block you
    from forward progress.<br>
    <blockquote type="cite"
cite="mid:CAAkRFZ+oqLRowCP7EWb7iAYwvvoiZ2atNLLUW_LXAiZ8J8KzJw@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div><br>
            </div>
            <div>David</div>
            <div><br>
            </div>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div text="#000000" bgcolor="#FFFFFF">
                <p> </p>
                <p>You acknowledge this in your alternatives discussion,
                  but I don't follow why BFI invalidation at the
                  function level has to invalidate the inter procedural
                  result.  You'd certainly need to use BFI when
                  *running* your analysis, but once computed the results
                  should be stable across updates within the function. 
                  You might need update logic for inlining, outlining,
                  and mergefunc, but that should be it in terms of
                  current LLVM transforms?</p>
                <p>Philip<br>
                </p>
                <div>
                  <div class="h5"> <br>
                    <div class="m_-5925272772691801037moz-cite-prefix">On
                      12/12/2017 05:02 PM, Easwaran Raman via llvm-dev
                      wrote:<br>
                    </div>
                  </div>
                </div>
                <blockquote type="cite">
                  <div>
                    <div class="h5">
                      <div dir="ltr"><span
id="m_-5925272772691801037gmail-docs-internal-guid-54633d65-4d61-93c5-4945-401b16f25f9a">
                          <p dir="ltr"
                            style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">Functions in LLVM IR have a function_entry_count metadata that is attached in PGO compilation. By using the entry count together with the block frequency info, the compiler computes the profile count of call instructions based on which the hotness/coldness of callsites can be determined. Experiments have shown that using a higher threshold for hot callsites results in improved runtime performance of the generated code without significant code size increases. We propose to generate synthetic function counts for non-PGO compilation and use the counts for boosting hot callsites during inlining. </span></p>
                          <br>
                          <p dir="ltr"
                            style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">Synthetic function entry counts of functions are initialized based on properties of the function such as whether it is visible outside the module, whether it has an inline keyword and so on. Then, the callgraph SCC is traversed in reverse post-order. Counts of callsites are determined based on the entry count and the block frequency of the callsite. The callsite count gets added to the entry count of the callee. For targets of indirect calls, we will use the !callees metadata to find the possible targets and distribute the count equally among them. For functions in a non-trivial SCC, the algorithm has to ensure that the counts are stable and deterministic.</span></p>
                          <br>
                          <p dir="ltr"
                            style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap"> In ThinLTO mode, the function summary contains the list of call edges from the function. We propose to add the relative block frequency on these edges. During the thinlink phase, we propagate the function counts on the entire call graph and update the function summary with the synthetic counts. Additionally, we plan to use the computed counts to drive the importing decisions. </span></p>
                          <br>
                          <p dir="ltr"
                            style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">Alternative approach </span></p>
                          <p dir="ltr"
                            style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">-----------------------------</span></p>
                          <p dir="ltr"
                            style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">
</span></p>
                          <p dir="ltr"
                            style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">An alternative to generating synthetic counts is to make block frequency info an inter-procedural analysis. Such an analysis would allow comparing BFI of callsites in two different functions. This has several downsides:</span></p>
                          <ul style="margin-top:0pt;margin-bottom:0pt">
                            <li dir="ltr" style="list-style-type:disc;font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap"><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;background-color:transparent;vertical-align:baseline;white-space:pre-wrap">The inter-procedural BFI computation is likely to be more expensive in terms of compile-time. </span></p></li>
                            <li dir="ltr" style="list-style-type:disc;font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap"><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;background-color:transparent;vertical-align:baseline;white-space:pre-wrap">Many function passes invalidate the BFI. This will require selective invalidation of function BFIs.</span></p></li>
                            <li dir="ltr" style="list-style-type:disc;font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap"><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;background-color:transparent;vertical-align:baseline;white-space:pre-wrap">Inliner correctly updates function counts of a callee after a callsite is inlined. We can piggyback on this mechanism by using synthetic counts. </span></p></li>
                          </ul>
                        </span></div>
                      <br>
                      <fieldset
                        class="m_-5925272772691801037mimeAttachmentHeader"></fieldset>
                      <br>
                    </div>
                  </div>
                  <pre>______________________________<wbr>_________________
LLVM Developers mailing list
<a class="m_-5925272772691801037moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org" target="_blank" moz-do-not-send="true">llvm-dev@lists.llvm.org</a>
<a class="m_-5925272772691801037moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank" moz-do-not-send="true">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a>
</pre>
                </blockquote>
                <br>
              </div>
            </blockquote>
          </div>
          <br>
        </div>
      </div>
    </blockquote>
    <br>
  </body>
</html>