<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <p><br>

    </p>

    <br>

    <div class="moz-cite-prefix">On 8/4/2017 2:55 PM, Daniel Berlin

      wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:CAF4BwTVm5LxrTE6k8-335afeij10x3Or6mjqwKsC21H=o4bKAQ@mail.gmail.com">

      <div dir="ltr"><br>

        <div class="gmail_extra"><br>

          <div class="gmail_quote">On Fri, Aug 4, 2017 at 11:41 AM, Chad

            Rosier <span dir="ltr"><<a

                href="mailto:mcrosier@codeaurora.org" target="_blank"

                moz-do-not-send="true">mcrosier@codeaurora.org</a>></span>

            wrote:<br>

            <blockquote class="gmail_quote" style="margin:0px 0px 0px

              0.8ex;border-left:1px solid

              rgb(204,204,204);padding-left:1ex">

              <div bgcolor="#FFFFFF"><span class="gmail-">

                  <p><br>

                  </p>

                  <br>

                  <div

                    class="gmail-m_4340832051958399694moz-cite-prefix">On

                    8/4/2017 2:06 PM, Daniel Berlin wrote:<br>

                  </div>

                  <blockquote type="cite">

                    <div dir="ltr">A few notes:<br>

                      I'm a bit surprised IPO copy/constant propagation

                      doesn't get this case, but i didn't look if the

                      lattice supports variables.

                      <div>In particular, in your example, given no

                        other call sites, it should eliminate the dead

                        code.</div>

                      <div>(In a real program, it may require cloning).</div>

                    </div>

                  </blockquote>

                  <br>

                </span> In the actual program (SPEC2017/gcc,

                ironically), there are multiple calls to fn2 and only

                one of them has the property that the 1st and 2nd

                argument are the same (as is shown in my pseudo code). 

                Internally, we have another developer, Matt Simpson,

                working on a function specialization patch that might be

                of value here.  Specifically, you could clone fn2 based

                on the fact that a_ptr == dst_ptr and then simplify a

                great deal of the function.  However, that patch is

                still a WIP.</div>

            </blockquote>

            <div><br>

            </div>

            <div>FWIW: You almost certainly want to integrate that with

              IPA based constant propagation, as it is the thing you

              should be using to tell you what will happen in the call. 

              It should actually not be difficult at all (I can give you

              references to papers, but it's just a couple hundred lines

              of code on top of our current propagation engine)</div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    If I'm not mistaken, this is exactly what Matt is doing. :D<br>

    <blockquote type="cite"

cite="mid:CAF4BwTVm5LxrTE6k8-335afeij10x3Or6mjqwKsC21H=o4bKAQ@mail.gmail.com">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div><br>

            </div>

            <div>(It can also later be used to do type-base devirt).</div>

            <div><br>

            </div>

            <div>GCC will do partial specialization (IE contextually

              decide to clone callsites where it believes the

              constantness/etc will cause elimination)</div>

            <div><br>

            </div>

            <blockquote class="gmail_quote" style="margin:0px 0px 0px

              0.8ex;border-left:1px solid

              rgb(204,204,204);padding-left:1ex">

              <div bgcolor="#FFFFFF"><span class="gmail-"><br>

                  <br>

                  <blockquote type="cite">

                    <div dir="ltr">

                      <div>GCC will do IPA-CP/const-prop with cloning,

                        and i'm wildly curious if new GCC's catch this

                        case for you at higher optimization levels?</div>

                    </div>

                  </blockquote>

                  <br>

                </span> GCC does inline fn2 into fn1 in this particular

                case, but I'm not exactly sure how GCC accomplishes

                this.  I'm guessing GCC is just more aggressive with its

                inlining (fn2 is also marked with the inline keyword,

                which I assume GCC uses as a hint).  I'm speculating

                here and I've never worked on GCC, so unfortunately I

                have little to go on.</div>

            </blockquote>

            <div><br>

            </div>

            <div>I meant if you turn off inlining :)</div>

          </div>

        </div>

      </div>

    </blockquote>

    Doh. Right.<br>

    <br>

    <blockquote type="cite"

cite="mid:CAF4BwTVm5LxrTE6k8-335afeij10x3Or6mjqwKsC21H=o4bKAQ@mail.gmail.com">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div><br>

            </div>

            <div>I can take a gander though, given the info you've

              given.</div>

            <div><br>

            </div>

            <div> <br>

            </div>

            <blockquote class="gmail_quote" style="margin:0px 0px 0px

              0.8ex;border-left:1px solid

              rgb(204,204,204);padding-left:1ex">

              <div bgcolor="#FFFFFF"><span class="gmail-"><br>

                  <br>

                  <blockquote type="cite">

                    <div dir="ltr">

                      <div>If so, it may be worth not looking at this as

                        an inlining problem, but as an area we need IPO

                        infrastructure improvement <br>

                      </div>

                    </div>

                  </blockquote>

                  <br>

                </span> Because of the multiple callsites with varying

                characteristics I'm not sure this can be solved in this

                way.</div>

            </blockquote>

            <div><br>

            </div>

            <div>FWIW: It definitely can.  Whether we want to, ....</div>

          </div>

        </div>

      </div>

    </blockquote>

    Yes, you're right.<br>

    <br>

    <blockquote type="cite"

cite="mid:CAF4BwTVm5LxrTE6k8-335afeij10x3Or6mjqwKsC21H=o4bKAQ@mail.gmail.com">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div><br>

            </div>

            <div>That said, this is the whole purpose of IPA const/copy

              prop.</div>

            <div>LLVM stops at the "propagate things that are always

              constant in every case" whereas, most compilers do "if

              worth it, clone this function callsite where i can prove

              it will be constant ".</div>

            <div><br>

            </div>

            <div>You probably not want to rely on inlining for all

              possible IPO effects.  Especially in this case, where IPO

              can actually do the job.</div>

          </div>

        </div>

      </div>

    </blockquote>

    Yes, that makes sense.<br>

    <br>

    <blockquote type="cite"

cite="mid:CAF4BwTVm5LxrTE6k8-335afeij10x3Or6mjqwKsC21H=o4bKAQ@mail.gmail.com">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div><br>

            </div>

            <div>IMHO, if you can, you want to reserve inlining-as-IPO

              for the cases where the IPO algorithms are

              difficult/expensive/etc.</div>

            <div><br>

            </div>

            <div>Or, at the very least, drive inlining like this by the

              results of those IPO algorithms.</div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    Sounds reasonable.  I'll throw this idea around with Matt and Geoff.<br>

    <br>

    <blockquote type="cite"

cite="mid:CAF4BwTVm5LxrTE6k8-335afeij10x3Or6mjqwKsC21H=o4bKAQ@mail.gmail.com">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div><br>

            </div>

            <blockquote class="gmail_quote" style="margin:0px 0px 0px

              0.8ex;border-left:1px solid

              rgb(204,204,204);padding-left:1ex">

              <div bgcolor="#FFFFFF"><span class="gmail-"><br>

                  <br>

                  <blockquote type="cite">

                    <div dir="ltr">

                      <div>

                        <div><br>

                        </div>

                        <div>Otherwise,  a couple things:

                          <div>

                            <div>Approximate dominators (for example,

                              semi-dominators) can be computed fast (a

                              DFS walk of the CFG with no real

                              additional computation)</div>

                            <div>Except in strange CFGs that jump around

                              a lot, they are the dominators.</div>

                            <div><br>

                            </div>

                            <div>More importantly, the dominator is

                              either the sdom or a proper ancestor of

                              the sdom.</div>

                            <div><br>

                            </div>

                            <div>The practical impact of this is that if

                              you use them as if they were dominators,

                              the set of conditions you discover will

                              not be "too wrong".  Occasionally wrong,

                              but mostly not.</div>

                            <div><br>

                            </div>

                            <div>My guess is the cost of doing

                              approximate dominators is ~50-60% of the

                              cost of doing dominators.  Nowadays, about

                              half the time was in the DFS walk, the

                              other half in the computation. At best, it

                              would be 2-3x faster.</div>

                            <div>I've no idea if this changes whether

                              we'd want dominators, approximate

                              dominators, or stick with nothing.</div>

                          </div>

                        </div>

                      </div>

                    </div>

                  </blockquote>

                  <br>

                </span> Right, this is kinda one of the bigger questions

                I'm trying to figure out.  My proposed solution doesn't

                use the dominator tree in order to minimize the impact

                on compile-time.  However, I'd guess the ROI is going to

                be much smaller because of the limited scope.  On the

                other end of the spectrum I'd fear the full dominator

                tree would be too computationally expensive (but of

                course some of that could be mitigated by the ability to

                do incremental updates to the dominator tree).<span

                  class="gmail-"><br>

                  <br>

                  <blockquote type="cite">

                    <div dir="ltr">

                      <div>

                        <div>

                          <div>

                            <div><br>

                            </div>

                            <div>If you have some sort of dominatorish

                              tree could then just use earlycse's method

                              of dominating condition finding:</div>

                            <div>Process in "dom tree" top-down order,

                              push the equivalences you see, pop the

                              relevant ones when you exit the relevant

                              dom tree scope.</div>

                            <div><br>

                            </div>

                            <div>In practice, you'd only check

                              comparisons against the hash table.</div>

                          </div>

                        </div>

                      </div>

                    </div>

                  </blockquote>

                  <br>

                </span> Humm.. I'll have to think about it for a bit. 

                I'm thinking this might be a good compromise for my

                needs.<span class="gmail-"><br>

                  <br>

                  <blockquote type="cite">

                    <div dir="ltr">

                      <div>

                        <div>

                          <div>

                            <div><br>

                            </div>

                            <div>The other option is PredicateInfo, but

                              it requires dominators and modifies the

                              IR.</div>

                            <div><br>

                            </div>

                            <div>My guess is this is undesired/too

                              heavyweight for inline cost analysis,

                              however the basic principle on how it

                              renames things could also be applied

                              without IR changing for this specific

                              case.  Unlike the EarlyCSE method, which

                              is O(all instructons) PredicateInfo is

                              O(branches + number of uses of variables

                              affected by conditions)  Without going

                              into futher details, if all you care about

                              is "for each condition, give me the set of

                              possibly affected variables" (so you can

                              see if they may simplify), we could do

                              that very very quickly (as fast as we can

                              sort a vector). But it does require

                              dominators.</div>

                          </div>

                        </div>

                      </div>

                    </div>

                  </blockquote>

                  <br>

                </span> For my particular problem, I think PredicateInfo

                would be sufficient IIUYC.  But as you suggest, I'm

                thinking people aren't going to be fond of using the

                full dominators.<br>

                <br>

                Lots of great feedback.  Thanks, Danny.

                <div>

                  <div class="gmail-h5"><br>

                  </div>

                </div>

              </div>

            </blockquote>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

  </body>

</html>