<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <p>David, could you please clarify on which code did you gain 10%

      improvement? I have run numerous tests with and w/o this option

      and it looks like it has no effect on performance (I am talking of

      the old 2016 sample to be concrete). Maybe we could investigate it

      together? Just tell me where to start?<br>

    </p>

    <br>

    <div class="moz-cite-prefix">On 02/07/2018 02:11 AM, Xinliang David

      Li wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:CALRgJCM_DX7j_HdvoqTUeRKqh17F+v=Ay4x1=jaAs6RTrMQYQQ@mail.gmail.com">

      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

      <div dir="ltr">Victor, thanks for the experiment.

        <div><br>

        </div>

        <div>My suspicion is it is due to the remaining issues with

          block layout -- especially with loop rotation (with PGO).

          Another problem is that tail dup is not happening after loop

          rotation which can limit the effectiveness of loop rotation.</div>

        <div><br>

        </div>

        <div>I tried the internal option -mllvm

          -force-precise-rotation-cost and there is about 10% speedup

          with -fprofile-use. This option turns on more precise cost

          model when computing rotation strategy but it is not turned on

          by default.</div>

        <div><br>

        </div>

        <div>+carrot who is working on this area.</div>

        <div><br>

        </div>

        <div>thanks,</div>

        <div><br>

        </div>

        <div>David</div>

      </div>

      <div class="gmail_extra"><br>

        <div class="gmail_quote">On Tue, Feb 6, 2018 at 1:37 PM, Victor

          Leschuk <span dir="ltr"><<a

              href="mailto:vleschuk@accesssoftek.com" target="_blank"

              moz-do-not-send="true">vleschuk@accesssoftek.com</a>></span>

          wrote:<br>

          <blockquote class="gmail_quote" style="margin:0 0 0

            .8ex;border-left:1px #ccc solid;padding-left:1ex">

            <div text="#000000" bgcolor="#FFFFFF">

              <p>Hello David, thanks for detailed response!</p>

              <p>Do you have any tests that you use to measure the PGO

                effectiveness? I have tested clang version 6.0 with the

                same sample that Jie Chen used in 2016 and actually both

                frontend-based PGO and IR-based make code run slower,

                see the average time:</p>

              <p>clang++ -O3: 3.15 sec </p>

              <p>clang++ -O3 and -fprofile-instr-use: 3.160 sec<br>

              </p>

              <p>clang++ -O3 and -fprofile-use: 3.180 sec<br>

              </p>

              <p>g++ (7.3.0) -O3: 3.640 sec<br>

              </p>

              <p>g++ (7.3.0) -O3 and -fprofile-use: 2.92 sec</p>

              <p>Do you have any idea what can be wrong? Maybe there are

                some recommendations in which cases one should use PGO

                with clang and when it is better not to do it?</p>

              <p>Thanks!<br>

              </p>

              <div>

                <div class="h5"> <br>

                  <div class="m_-5231669173907304757moz-cite-prefix">On

                    02/05/2018 09:38 AM, Xinliang David Li wrote:<br>

                  </div>

                  <blockquote type="cite">

                    <div dir="ltr"><br>

                      <div class="gmail_extra"><br>

                        <div class="gmail_quote">On Sun, Feb 4, 2018 at

                          9:59 PM, Victor Leschuk <span dir="ltr"><<a

                              href="mailto:vleschuk@accesssoftek.com"

                              target="_blank" moz-do-not-send="true">vleschuk@accesssoftek.com</a>></span>

                          wrote:<br>

                          <blockquote class="gmail_quote"

                            style="margin:0 0 0 .8ex;border-left:1px

                            #ccc solid;padding-left:1ex">Hello David!<br>

                            <br>

                            I have recently started acquaintance with

                            PGO in LLVM/clang and found<br>

                            your e-mail thread:<br>

                            <a

                              href="http://lists.llvm.org/pipermail/llvm-dev/2016-May/099395.html"

                              rel="noreferrer" target="_blank"

                              moz-do-not-send="true">http://lists.llvm.org/pipermai<wbr>l/llvm-dev/2016-May/099395.<wbr>html</a>

                            . Here you<br>

                            posted a nice list of optimizations that use

                            profiling and of those<br>

                            which could be using but don't. However that

                            thread is about 2 years<br>

                            old. Could you please kindly let me know if

                            there were any significant<br>

                            changes in this area since that time?<br>

                          </blockquote>

                          <div><br>

                          </div>

                          <div><br>

                          </div>

                          <div>Yes, there were quite some changes since

                            then. Here are some of the new features:</div>

                          <div><br>

                          </div>

                          <div>* LLVM IR based PGO -- this is designed

                            to maximize program performance. The option

                            to turn it on is

                            -fprofile-generate/-fprofile-<wbr>use</div>

                          <div>* value profiling support in PGO --

                            currently support indirect call target

                            profiling and memcpy/memset size profiling

                            and optimizations</div>

                          <div>* Profile data is made available for

                            inliner to use (enabled only for the new

                            pass manager: -fexperimental-new-pass-<wbr>manager)</div>

                          <div>* Profile aware LICM is available --

                            implemented via a profile driven code

                            sinking pass </div>

                          <div>* Partial inlining is made profile

                            aware;  Graham Yu also added support for

                            multiple region function outlining (with

                            PGO)</div>

                          <div>* BB layout heuristics are tuned with PGO</div>

                          <div>* hotness driven function layout

                            optimization </div>

                          <div><br>

                          </div>

                          <div>There are pending work in the following

                            area:</div>

                          <div>* profile aware loop vectorization, etc</div>

                          <div>* control heigh reduction optimization

                            (Hiroshi is working on this)</div>

                          <div><br>

                          </div>

                          <div>ThinLTO also works well with PGO.</div>

                          <div><br>

                          </div>

                          <div>Hope this helps.</div>

                          <div><br>

                          </div>

                          <div>David</div>

                          <div><br>

                          </div>

                          <div>

                            <pre style="white-space:pre-wrap;color:rgb(0,0,0);font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;word-spacing:0px;text-decoration-style:initial;text-decoration-color:initial">><i> What I can tell you is that there are many missing ones (that can benefit

</i>from profile): such as profile aware LICM (patch pending), speculative PRE,

loop unrolling, loop peeling, auto vectorization, inlining, function

splitting, function layout, function outlinling,  profile driven size

optimization, induction variable optimization/strength reduction, stringOp

specialization/optimization/<wbr>inlining, switch peeling/lowering etc. The

biggest profile user today include ralloc, BB layout, ifcvt, shrinkwrapping

etc, but there should be rooms to be improvement there too.</pre>

                            <br>

                          </div>

                          <blockquote class="gmail_quote"

                            style="margin:0 0 0 .8ex;border-left:1px

                            #ccc solid;padding-left:1ex"> <br>

                            Thanks in advance!<br>

                            <span class="m_-5231669173907304757HOEnZb"><font

                                color="#888888"><br>

                                --<br>

                                Best Regards,<br>

                                <br>

                                Victor Leschuk | Software Engineer |

                                Access Softek<br>

                                <br>

                              </font></span></blockquote>

                        </div>

                        <br>

                      </div>

                    </div>

                  </blockquote>

                  <br>

                  <pre class="m_-5231669173907304757moz-signature" cols="72">-- 

Best Regards,

Victor Leschuk | Software Engineer | Access Softek</pre>

                </div>

              </div>

            </div>

          </blockquote>

        </div>

        <br>

      </div>

    </blockquote>

    <br>

    <pre class="moz-signature" cols="72">-- 

Best Regards,

Victor Leschuk | Software Engineer | Access Softek</pre>

  </body>

</html>