<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <p><br>
    </p>
    <br>
    <div class="moz-cite-prefix">On 8/4/2017 2:55 PM, Daniel Berlin
      wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CAF4BwTVm5LxrTE6k8-335afeij10x3Or6mjqwKsC21H=o4bKAQ@mail.gmail.com">
      <div dir="ltr"><br>
        <div class="gmail_extra"><br>
          <div class="gmail_quote">On Fri, Aug 4, 2017 at 11:41 AM, Chad
            Rosier <span dir="ltr"><<a
                href="mailto:mcrosier@codeaurora.org" target="_blank"
                moz-do-not-send="true">mcrosier@codeaurora.org</a>></span>
            wrote:<br>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px
              0.8ex;border-left:1px solid
              rgb(204,204,204);padding-left:1ex">
              <div bgcolor="#FFFFFF"><span class="gmail-">
                  <p><br>
                  </p>
                  <br>
                  <div
                    class="gmail-m_4340832051958399694moz-cite-prefix">On
                    8/4/2017 2:06 PM, Daniel Berlin wrote:<br>
                  </div>
                  <blockquote type="cite">
                    <div dir="ltr">A few notes:<br>
                      I'm a bit surprised IPO copy/constant propagation
                      doesn't get this case, but i didn't look if the
                      lattice supports variables.
                      <div>In particular, in your example, given no
                        other call sites, it should eliminate the dead
                        code.</div>
                      <div>(In a real program, it may require cloning).</div>
                    </div>
                  </blockquote>
                  <br>
                </span> In the actual program (SPEC2017/gcc,
                ironically), there are multiple calls to fn2 and only
                one of them has the property that the 1st and 2nd
                argument are the same (as is shown in my pseudo code). 
                Internally, we have another developer, Matt Simpson,
                working on a function specialization patch that might be
                of value here.  Specifically, you could clone fn2 based
                on the fact that a_ptr == dst_ptr and then simplify a
                great deal of the function.  However, that patch is
                still a WIP.</div>
            </blockquote>
            <div><br>
            </div>
            <div>FWIW: You almost certainly want to integrate that with
              IPA based constant propagation, as it is the thing you
              should be using to tell you what will happen in the call. 
              It should actually not be difficult at all (I can give you
              references to papers, but it's just a couple hundred lines
              of code on top of our current propagation engine)</div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    If I'm not mistaken, this is exactly what Matt is doing. :D<br>
    <blockquote type="cite"
cite="mid:CAF4BwTVm5LxrTE6k8-335afeij10x3Or6mjqwKsC21H=o4bKAQ@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div><br>
            </div>
            <div>(It can also later be used to do type-base devirt).</div>
            <div><br>
            </div>
            <div>GCC will do partial specialization (IE contextually
              decide to clone callsites where it believes the
              constantness/etc will cause elimination)</div>
            <div><br>
            </div>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px
              0.8ex;border-left:1px solid
              rgb(204,204,204);padding-left:1ex">
              <div bgcolor="#FFFFFF"><span class="gmail-"><br>
                  <br>
                  <blockquote type="cite">
                    <div dir="ltr">
                      <div>GCC will do IPA-CP/const-prop with cloning,
                        and i'm wildly curious if new GCC's catch this
                        case for you at higher optimization levels?</div>
                    </div>
                  </blockquote>
                  <br>
                </span> GCC does inline fn2 into fn1 in this particular
                case, but I'm not exactly sure how GCC accomplishes
                this.  I'm guessing GCC is just more aggressive with its
                inlining (fn2 is also marked with the inline keyword,
                which I assume GCC uses as a hint).  I'm speculating
                here and I've never worked on GCC, so unfortunately I
                have little to go on.</div>
            </blockquote>
            <div><br>
            </div>
            <div>I meant if you turn off inlining :)</div>
          </div>
        </div>
      </div>
    </blockquote>
    Doh. Right.<br>
    <br>
    <blockquote type="cite"
cite="mid:CAF4BwTVm5LxrTE6k8-335afeij10x3Or6mjqwKsC21H=o4bKAQ@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div><br>
            </div>
            <div>I can take a gander though, given the info you've
              given.</div>
            <div><br>
            </div>
            <div> <br>
            </div>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px
              0.8ex;border-left:1px solid
              rgb(204,204,204);padding-left:1ex">
              <div bgcolor="#FFFFFF"><span class="gmail-"><br>
                  <br>
                  <blockquote type="cite">
                    <div dir="ltr">
                      <div>If so, it may be worth not looking at this as
                        an inlining problem, but as an area we need IPO
                        infrastructure improvement <br>
                      </div>
                    </div>
                  </blockquote>
                  <br>
                </span> Because of the multiple callsites with varying
                characteristics I'm not sure this can be solved in this
                way.</div>
            </blockquote>
            <div><br>
            </div>
            <div>FWIW: It definitely can.  Whether we want to, ....</div>
          </div>
        </div>
      </div>
    </blockquote>
    Yes, you're right.<br>
    <br>
    <blockquote type="cite"
cite="mid:CAF4BwTVm5LxrTE6k8-335afeij10x3Or6mjqwKsC21H=o4bKAQ@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div><br>
            </div>
            <div>That said, this is the whole purpose of IPA const/copy
              prop.</div>
            <div>LLVM stops at the "propagate things that are always
              constant in every case" whereas, most compilers do "if
              worth it, clone this function callsite where i can prove
              it will be constant ".</div>
            <div><br>
            </div>
            <div>You probably not want to rely on inlining for all
              possible IPO effects.  Especially in this case, where IPO
              can actually do the job.</div>
          </div>
        </div>
      </div>
    </blockquote>
    Yes, that makes sense.<br>
    <br>
    <blockquote type="cite"
cite="mid:CAF4BwTVm5LxrTE6k8-335afeij10x3Or6mjqwKsC21H=o4bKAQ@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div><br>
            </div>
            <div>IMHO, if you can, you want to reserve inlining-as-IPO
              for the cases where the IPO algorithms are
              difficult/expensive/etc.</div>
            <div><br>
            </div>
            <div>Or, at the very least, drive inlining like this by the
              results of those IPO algorithms.</div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    Sounds reasonable.  I'll throw this idea around with Matt and Geoff.<br>
    <br>
    <blockquote type="cite"
cite="mid:CAF4BwTVm5LxrTE6k8-335afeij10x3Or6mjqwKsC21H=o4bKAQ@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div><br>
            </div>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px
              0.8ex;border-left:1px solid
              rgb(204,204,204);padding-left:1ex">
              <div bgcolor="#FFFFFF"><span class="gmail-"><br>
                  <br>
                  <blockquote type="cite">
                    <div dir="ltr">
                      <div>
                        <div><br>
                        </div>
                        <div>Otherwise,  a couple things:
                          <div>
                            <div>Approximate dominators (for example,
                              semi-dominators) can be computed fast (a
                              DFS walk of the CFG with no real
                              additional computation)</div>
                            <div>Except in strange CFGs that jump around
                              a lot, they are the dominators.</div>
                            <div><br>
                            </div>
                            <div>More importantly, the dominator is
                              either the sdom or a proper ancestor of
                              the sdom.</div>
                            <div><br>
                            </div>
                            <div>The practical impact of this is that if
                              you use them as if they were dominators,
                              the set of conditions you discover will
                              not be "too wrong".  Occasionally wrong,
                              but mostly not.</div>
                            <div><br>
                            </div>
                            <div>My guess is the cost of doing
                              approximate dominators is ~50-60% of the
                              cost of doing dominators.  Nowadays, about
                              half the time was in the DFS walk, the
                              other half in the computation. At best, it
                              would be 2-3x faster.</div>
                            <div>I've no idea if this changes whether
                              we'd want dominators, approximate
                              dominators, or stick with nothing.</div>
                          </div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                  <br>
                </span> Right, this is kinda one of the bigger questions
                I'm trying to figure out.  My proposed solution doesn't
                use the dominator tree in order to minimize the impact
                on compile-time.  However, I'd guess the ROI is going to
                be much smaller because of the limited scope.  On the
                other end of the spectrum I'd fear the full dominator
                tree would be too computationally expensive (but of
                course some of that could be mitigated by the ability to
                do incremental updates to the dominator tree).<span
                  class="gmail-"><br>
                  <br>
                  <blockquote type="cite">
                    <div dir="ltr">
                      <div>
                        <div>
                          <div>
                            <div><br>
                            </div>
                            <div>If you have some sort of dominatorish
                              tree could then just use earlycse's method
                              of dominating condition finding:</div>
                            <div>Process in "dom tree" top-down order,
                              push the equivalences you see, pop the
                              relevant ones when you exit the relevant
                              dom tree scope.</div>
                            <div><br>
                            </div>
                            <div>In practice, you'd only check
                              comparisons against the hash table.</div>
                          </div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                  <br>
                </span> Humm.. I'll have to think about it for a bit. 
                I'm thinking this might be a good compromise for my
                needs.<span class="gmail-"><br>
                  <br>
                  <blockquote type="cite">
                    <div dir="ltr">
                      <div>
                        <div>
                          <div>
                            <div><br>
                            </div>
                            <div>The other option is PredicateInfo, but
                              it requires dominators and modifies the
                              IR.</div>
                            <div><br>
                            </div>
                            <div>My guess is this is undesired/too
                              heavyweight for inline cost analysis,
                              however the basic principle on how it
                              renames things could also be applied
                              without IR changing for this specific
                              case.  Unlike the EarlyCSE method, which
                              is O(all instructons) PredicateInfo is
                              O(branches + number of uses of variables
                              affected by conditions)  Without going
                              into futher details, if all you care about
                              is "for each condition, give me the set of
                              possibly affected variables" (so you can
                              see if they may simplify), we could do
                              that very very quickly (as fast as we can
                              sort a vector). But it does require
                              dominators.</div>
                          </div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                  <br>
                </span> For my particular problem, I think PredicateInfo
                would be sufficient IIUYC.  But as you suggest, I'm
                thinking people aren't going to be fond of using the
                full dominators.<br>
                <br>
                Lots of great feedback.  Thanks, Danny.
                <div>
                  <div class="gmail-h5"><br>
                  </div>
                </div>
              </div>
            </blockquote>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
  </body>
</html>