<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <p><a class="moz-txt-link-freetext" href="https://bugs.llvm.org/show_bug.cgi?id=36303">https://bugs.llvm.org/show_bug.cgi?id=36303</a> Please let me know if
      I can help somehow.<br>
    </p>
    <br>
    <div class="moz-cite-prefix">On 02/08/2018 01:22 AM, Xinliang David
      Li wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CALRgJCPntB9LnCpW0+wv+oB1BptpKdWo9JCL45PWrHBKE3Mjkg@mail.gmail.com">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <div dir="ltr">Victor, please file a bug tracking the issue. We
        can put relevant information there including test cases used in
        the experiment etc.
        <div><br>
        </div>
        <div>thanks,</div>
        <div><br>
        </div>
        <div>David</div>
      </div>
      <div class="gmail_extra"><br>
        <div class="gmail_quote">On Wed, Feb 7, 2018 at 2:15 PM, Victor
          Leschuk <span dir="ltr"><<a
              href="mailto:vleschuk@accesssoftek.com" target="_blank"
              moz-do-not-send="true">vleschuk@accesssoftek.com</a>></span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div text="#000000" bgcolor="#FFFFFF">
              <p>David, could you please clarify on which code did you
                gain 10% improvement? I have run numerous tests with and
                w/o this option and it looks like it has no effect on
                performance (I am talking of the old 2016 sample to be
                concrete). Maybe we could investigate it together? Just
                tell me where to start?<br>
              </p>
              <div>
                <div class="h5"> <br>
                  <div class="m_-7612381275907462952moz-cite-prefix">On
                    02/07/2018 02:11 AM, Xinliang David Li wrote:<br>
                  </div>
                  <blockquote type="cite">
                    <div dir="ltr">Victor, thanks for the experiment.
                      <div><br>
                      </div>
                      <div>My suspicion is it is due to the remaining
                        issues with block layout -- especially with loop
                        rotation (with PGO). Another problem is that
                        tail dup is not happening after loop rotation
                        which can limit the effectiveness of loop
                        rotation.</div>
                      <div><br>
                      </div>
                      <div>I tried the internal option -mllvm
                        -force-precise-rotation-cost and there is about
                        10% speedup with -fprofile-use. This option
                        turns on more precise cost model when computing
                        rotation strategy but it is not turned on by
                        default.</div>
                      <div><br>
                      </div>
                      <div>+carrot who is working on this area.</div>
                      <div><br>
                      </div>
                      <div>thanks,</div>
                      <div><br>
                      </div>
                      <div>David</div>
                    </div>
                    <div class="gmail_extra"><br>
                      <div class="gmail_quote">On Tue, Feb 6, 2018 at
                        1:37 PM, Victor Leschuk <span dir="ltr"><<a
                            href="mailto:vleschuk@accesssoftek.com"
                            target="_blank" moz-do-not-send="true">vleschuk@accesssoftek.com</a>></span>
                        wrote:<br>
                        <blockquote class="gmail_quote" style="margin:0
                          0 0 .8ex;border-left:1px #ccc
                          solid;padding-left:1ex">
                          <div text="#000000" bgcolor="#FFFFFF">
                            <p>Hello David, thanks for detailed
                              response!</p>
                            <p>Do you have any tests that you use to
                              measure the PGO effectiveness? I have
                              tested clang version 6.0 with the same
                              sample that Jie Chen used in 2016 and
                              actually both frontend-based PGO and
                              IR-based make code run slower, see the
                              average time:</p>
                            <p>clang++ -O3: 3.15 sec </p>
                            <p>clang++ -O3 and -fprofile-instr-use:
                              3.160 sec<br>
                            </p>
                            <p>clang++ -O3 and -fprofile-use: 3.180 sec<br>
                            </p>
                            <p>g++ (7.3.0) -O3: 3.640 sec<br>
                            </p>
                            <p>g++ (7.3.0) -O3 and -fprofile-use: 2.92
                              sec</p>
                            <p>Do you have any idea what can be wrong?
                              Maybe there are some recommendations in
                              which cases one should use PGO with clang
                              and when it is better not to do it?</p>
                            <p>Thanks!<br>
                            </p>
                            <div>
                              <div class="m_-7612381275907462952h5"> <br>
                                <div
                                  class="m_-7612381275907462952m_-5231669173907304757moz-cite-prefix">On
                                  02/05/2018 09:38 AM, Xinliang David Li
                                  wrote:<br>
                                </div>
                                <blockquote type="cite">
                                  <div dir="ltr"><br>
                                    <div class="gmail_extra"><br>
                                      <div class="gmail_quote">On Sun,
                                        Feb 4, 2018 at 9:59 PM, Victor
                                        Leschuk <span dir="ltr"><<a
href="mailto:vleschuk@accesssoftek.com" target="_blank"
                                            moz-do-not-send="true">vleschuk@accesssoftek.com</a>></span>
                                        wrote:<br>
                                        <blockquote class="gmail_quote"
                                          style="margin:0 0 0
                                          .8ex;border-left:1px #ccc
                                          solid;padding-left:1ex">Hello
                                          David!<br>
                                          <br>
                                          I have recently started
                                          acquaintance with PGO in
                                          LLVM/clang and found<br>
                                          your e-mail thread:<br>
                                          <a
                                            href="http://lists.llvm.org/pipermail/llvm-dev/2016-May/099395.html"
                                            rel="noreferrer"
                                            target="_blank"
                                            moz-do-not-send="true">http://lists.llvm.org/pipermai<wbr>l/llvm-dev/2016-May/099395.htm<wbr>l</a>
                                          . Here you<br>
                                          posted a nice list of
                                          optimizations that use
                                          profiling and of those<br>
                                          which could be using but
                                          don't. However that thread is
                                          about 2 years<br>
                                          old. Could you please kindly
                                          let me know if there were any
                                          significant<br>
                                          changes in this area since
                                          that time?<br>
                                        </blockquote>
                                        <div><br>
                                        </div>
                                        <div><br>
                                        </div>
                                        <div>Yes, there were quite some
                                          changes since then. Here are
                                          some of the new features:</div>
                                        <div><br>
                                        </div>
                                        <div>* LLVM IR based PGO -- this
                                          is designed to maximize
                                          program performance. The
                                          option to turn it on is
                                          -fprofile-generate/-fprofile-u<wbr>se</div>
                                        <div>* value profiling support
                                          in PGO -- currently support
                                          indirect call target profiling
                                          and memcpy/memset size
                                          profiling and optimizations</div>
                                        <div>* Profile data is made
                                          available for inliner to use
                                          (enabled only for the new pass
                                          manager:
                                          -fexperimental-new-pass-manage<wbr>r)</div>
                                        <div>* Profile aware LICM is
                                          available -- implemented via a
                                          profile driven code sinking
                                          pass </div>
                                        <div>* Partial inlining is made
                                          profile aware;  Graham Yu also
                                          added support for multiple
                                          region function outlining
                                          (with PGO)</div>
                                        <div>* BB layout heuristics are
                                          tuned with PGO</div>
                                        <div>* hotness driven function
                                          layout optimization </div>
                                        <div><br>
                                        </div>
                                        <div>There are pending work in
                                          the following area:</div>
                                        <div>* profile aware loop
                                          vectorization, etc</div>
                                        <div>* control heigh reduction
                                          optimization (Hiroshi is
                                          working on this)</div>
                                        <div><br>
                                        </div>
                                        <div>ThinLTO also works well
                                          with PGO.</div>
                                        <div><br>
                                        </div>
                                        <div>Hope this helps.</div>
                                        <div><br>
                                        </div>
                                        <div>David</div>
                                        <div><br>
                                        </div>
                                        <div>
                                          <pre style="white-space:pre-wrap;color:rgb(0,0,0);font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;word-spacing:0px;text-decoration-style:initial;text-decoration-color:initial">><i> What I can tell you is that there are many missing ones (that can benefit
</i>from profile): such as profile aware LICM (patch pending), speculative PRE,
loop unrolling, loop peeling, auto vectorization, inlining, function
splitting, function layout, function outlinling,  profile driven size
optimization, induction variable optimization/strength reduction, stringOp
specialization/optimization/in<wbr>lining, switch peeling/lowering etc. The
biggest profile user today include ralloc, BB layout, ifcvt, shrinkwrapping
etc, but there should be rooms to be improvement there too.</pre>
                                          <br>
                                        </div>
                                        <blockquote class="gmail_quote"
                                          style="margin:0 0 0
                                          .8ex;border-left:1px #ccc
                                          solid;padding-left:1ex"> <br>
                                          Thanks in advance!<br>
                                          <span
                                            class="m_-7612381275907462952m_-5231669173907304757HOEnZb"><font
                                              color="#888888"><br>
                                              --<br>
                                              Best Regards,<br>
                                              <br>
                                              Victor Leschuk | Software
                                              Engineer | Access Softek<br>
                                              <br>
                                            </font></span></blockquote>
                                      </div>
                                      <br>
                                    </div>
                                  </div>
                                </blockquote>
                                <br>
                                <pre class="m_-7612381275907462952m_-5231669173907304757moz-signature" cols="72">-- 
Best Regards,

Victor Leschuk | Software Engineer | Access Softek</pre>
                              </div>
                            </div>
                          </div>
                        </blockquote>
                      </div>
                      <br>
                    </div>
                  </blockquote>
                  <br>
                  <pre class="m_-7612381275907462952moz-signature" cols="72">-- 
Best Regards,

Victor Leschuk | Software Engineer | Access Softek</pre>
                </div>
              </div>
            </div>
          </blockquote>
        </div>
        <br>
      </div>
    </blockquote>
    <br>
    <pre class="moz-signature" cols="72">-- 
Best Regards,

Victor Leschuk | Software Engineer | Access Softek</pre>
  </body>
</html>