<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <p><br>

    </p>

    <br>

    <div class="moz-cite-prefix">On 8/4/2017 2:06 PM, Daniel Berlin

      wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:CAF4BwTVVfnnB6hArJgU_9B5a=2-K7QDo57aNuArK4wZjNevvbw@mail.gmail.com">

      <div dir="ltr">A few notes:<br>

        I'm a bit surprised IPO copy/constant propagation doesn't get

        this case, but i didn't look if the lattice supports variables.

        <div>In particular, in your example, given no other call sites,

          it should eliminate the dead code.</div>

        <div>(In a real program, it may require cloning).</div>

      </div>

    </blockquote>

    <br>

    In the actual program (SPEC2017/gcc, ironically), there are multiple

    calls to fn2 and only one of them has the property that the 1st and

    2nd argument are the same (as is shown in my pseudo code). 

    Internally, we have another developer, Matt Simpson, working on a

    function specialization patch that might be of value here. 

    Specifically, you could clone fn2 based on the fact that a_ptr ==

    dst_ptr and then simplify a great deal of the function.  However,

    that patch is still a WIP.<br>

    <br>

    <blockquote type="cite"

cite="mid:CAF4BwTVVfnnB6hArJgU_9B5a=2-K7QDo57aNuArK4wZjNevvbw@mail.gmail.com">

      <div dir="ltr">

        <div>GCC will do IPA-CP/const-prop with cloning, and i'm wildly

          curious if new GCC's catch this case for you at higher

          optimization levels?</div>

      </div>

    </blockquote>

    <br>

    GCC does inline fn2 into fn1 in this particular case, but I'm not

    exactly sure how GCC accomplishes this.  I'm guessing GCC is just

    more aggressive with its inlining (fn2 is also marked with the

    inline keyword, which I assume GCC uses as a hint).  I'm speculating

    here and I've never worked on GCC, so unfortunately I have little to

    go on.<br>

    <br>

    <blockquote type="cite"

cite="mid:CAF4BwTVVfnnB6hArJgU_9B5a=2-K7QDo57aNuArK4wZjNevvbw@mail.gmail.com">

      <div dir="ltr">

        <div>If so, it may be worth not looking at this as an inlining

          problem, but as an area we need IPO infrastructure improvement

          <br>

        </div>

      </div>

    </blockquote>

    <br>

    Because of the multiple callsites with varying characteristics I'm

    not sure this can be solved in this way.<br>

    <br>

    <blockquote type="cite"

cite="mid:CAF4BwTVVfnnB6hArJgU_9B5a=2-K7QDo57aNuArK4wZjNevvbw@mail.gmail.com">

      <div dir="ltr">

        <div>

          <div><br>

          </div>

          <div>Otherwise,  a couple things:

            <div>

              <div>Approximate dominators (for example, semi-dominators)

                can be computed fast (a DFS walk of the CFG with no real

                additional computation)</div>

              <div>Except in strange CFGs that jump around a lot, they

                are the dominators.</div>

              <div><br>

              </div>

              <div>More importantly, the dominator is either the sdom or

                a proper ancestor of the sdom.</div>

              <div><br>

              </div>

              <div>The practical impact of this is that if you use them

                as if they were dominators, the set of conditions you

                discover will not be "too wrong".  Occasionally wrong,

                but mostly not.</div>

              <div><br>

              </div>

              <div>My guess is the cost of doing approximate dominators

                is ~50-60% of the cost of doing dominators.  Nowadays,

                about half the time was in the DFS walk, the other half

                in the computation. At best, it would be 2-3x faster.</div>

              <div>I've no idea if this changes whether we'd want

                dominators, approximate dominators, or stick with

                nothing.</div>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    Right, this is kinda one of the bigger questions I'm trying to

    figure out.  My proposed solution doesn't use the dominator tree in

    order to minimize the impact on compile-time.  However, I'd guess

    the ROI is going to be much smaller because of the limited scope. 

    On the other end of the spectrum I'd fear the full dominator tree

    would be too computationally expensive (but of course some of that

    could be mitigated by the ability to do incremental updates to the

    dominator tree).<br>

    <br>

    <blockquote type="cite"

cite="mid:CAF4BwTVVfnnB6hArJgU_9B5a=2-K7QDo57aNuArK4wZjNevvbw@mail.gmail.com">

      <div dir="ltr">

        <div>

          <div>

            <div>

              <div><br>

              </div>

              <div>If you have some sort of dominatorish tree could then

                just use earlycse's method of dominating condition

                finding:</div>

              <div>Process in "dom tree" top-down order, push the

                equivalences you see, pop the relevant ones when you

                exit the relevant dom tree scope.</div>

              <div><br>

              </div>

              <div>In practice, you'd only check comparisons against the

                hash table.</div>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    Humm.. I'll have to think about it for a bit.  I'm thinking this

    might be a good compromise for my needs.<br>

    <br>

    <blockquote type="cite"

cite="mid:CAF4BwTVVfnnB6hArJgU_9B5a=2-K7QDo57aNuArK4wZjNevvbw@mail.gmail.com">

      <div dir="ltr">

        <div>

          <div>

            <div>

              <div><br>

              </div>

              <div>The other option is PredicateInfo, but it requires

                dominators and modifies the IR.</div>

              <div><br>

              </div>

              <div>My guess is this is undesired/too heavyweight for

                inline cost analysis, however the basic principle on how

                it renames things could also be applied without IR

                changing for this specific case.  Unlike the EarlyCSE

                method, which is O(all instructons) PredicateInfo is

                O(branches + number of uses of variables affected by

                conditions)  Without going into futher details, if all

                you care about is "for each condition, give me the set

                of possibly affected variables" (so you can see if they

                may simplify), we could do that very very quickly (as

                fast as we can sort a vector). But it does require

                dominators.</div>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    For my particular problem, I think PredicateInfo would be sufficient

    IIUYC.  But as you suggest, I'm thinking people aren't going to be

    fond of using the full dominators.<br>

    <br>

    Lots of great feedback.  Thanks, Danny.<br>

    <br>

    <blockquote type="cite"

cite="mid:CAF4BwTVVfnnB6hArJgU_9B5a=2-K7QDo57aNuArK4wZjNevvbw@mail.gmail.com">

      <div dir="ltr">

        <div>

          <div>

            <div>

              <div><br>

              </div>

              <div> <br>

              </div>

            </div>

          </div>

        </div>

      </div>

      <div class="gmail_extra"><br>

        <div class="gmail_quote">On Fri, Aug 4, 2017 at 8:56 AM, Chad

          Rosier <span dir="ltr"><<a

              href="mailto:mcrosier@codeaurora.org" target="_blank"

              moz-do-not-send="true">mcrosier@codeaurora.org</a>></span>

          wrote:<br>

          <blockquote class="gmail_quote" style="margin:0 0 0

            .8ex;border-left:1px #ccc solid;padding-left:1ex">

            <div text="#000000" bgcolor="#FFFFFF">

              <p><tt>All,</tt></p>

              <p><tt>I'm working on an improvement to the inline cost

                  model, but I'm unsure how to proceed.  Let me begin by

                  first describing the problem I'm trying to solve. 

                  Consider the following pseudo C code:</tt></p>

              <p><b><font face="Courier New, Courier, monospace">typedef

                    struct element {<br>

                      unsigned idx;<br>

                    } element_t;<br>

                  </font></b></p>

              <p><b><font face="Courier New, Courier, monospace">static

                    inline<br>

                    unsigned char fn2 (element_t *dst_ptr, const

                    element_t *a_ptr,<br>

                                       const element_t *b_ptr, unsigned

                    char changed) {<br>

                      if (a_ptr && b_ptr &&

                    a_ptr->idx == b_ptr->idx) {<br>

                        if (!changed && dst_ptr &&

                    dst_ptr->idx == a_ptr->idx) {<br>

                          /* Do something */<br>

                        } else {<br>

                          changed = 1;<br>

                          if (!dst_ptr)<br>

                            dst_ptr = fn3();<br>

                          else<br>

                            dst_ptr->idx = a_ptr->idx;<br>

                          /* Do something. */<br>

                        }<br>

                      } else {<br>

                        changed = fn4();<br>

                      }<br>

                      return changed;<br>

                    }<br>

                    <br>

                    unsigned char fn1 (element_t *a_ptr, element_t

                    *b_ptr) {<br>

                      unsigned char changed = 0;<br>

                      while (b_ptr) {<br>

                        if (!a_ptr || a_ptr->idx == b_ptr->idx) {<br>

                          changed = fn2 (a_ptr, a_ptr, b_ptr, changed);<br>

                          b_ptr = b_ptr->next;<br>

                        }<br>

                      }<br>

                      return changed;<br>

                    }</font></b></p>

              <p><tt>When the inline cost model computes the inline cost

                  of fn2 it ends up being much higher than the inline

                  threshold.  A fair amount of the cost is due to the

                  inlining of fn3 into fn2.  However, if fn2 had been

                  inlined into fn1 the code from fn3 would have been

                  removed as dead/unreachable.<br>

                </tt></p>

              <p><tt>At the fn2 call site notice the first two arguments

                  are equal.  Thus, in the context of fn2 dst_ptr and

                  a_ptr are equal.  The call site of fn3 is predicated

                  on dst_ptr being null (i.e., if (!dst_ptr) dst_ptr =

                  fn3()), but that code is predicated on a_ptr being

                  non-null.  Therefore, we know the condition !dst_ptr

                  is false (because </tt><tt><tt>a_ptr == dst_ptr and </tt>a_ptr

                  is non-null) and the call to fn3 is dead.  I suspect

                  one of JumpThreading, EarlyCSE, or GVN does the

                  elimination after inlining, so that's what I'd like to

                  try and model in the inline cost model.  (Note fn2 has

                  multiple call sides and the property that the first

                  and second arguments are equal isn't true for each

                  call site, so something like IPSCCP doesn't actually

                  help, AFAICT).</tt></p>

              <p><tt>My first attempt at solving this problem did

                  something similar to what is done in

                  JumpThreadingPass::<wbr>ProcessImpliedCondition(). 

                  Specifically, it tried to prove that dst_ptr was

                  non-null based on a dominating condition.  The only

                  tricky parts were to deal with hammocks/diamonds when

                  walking up the CFG (</tt><tt><tt>See: <a

                      class="m_3626320905672759108moz-txt-link-freetext"

                      href="https://reviews.llvm.org/D36287"

                      target="_blank" moz-do-not-send="true">https://reviews.llvm.org/<wbr>D36287</a>

                    as a concrete example of how I proposed to get an

                    immediate dominator without the domtree</tt>) and to

                  account for the fact that dst_ptr and a_ptr are equal.</tt></p>

              <p><tt>I'm pretty sure I can get this approach to work,

                  however, I'm not convinced it's really extensible or

                  general.  Should we consider using the full dominator

                  tree in the inline cost model to capture this?<br>

                </tt></p>

              <p><tt>If you have any thoughts on how to tackle the

                  problem, I would love to hear your feedback!</tt></p>

              <span class="HOEnZb"><font color="#888888">

                  <p><tt> Chad<br>

                    </tt></p>

                </font></span></div>

          </blockquote>

        </div>

        <br>

      </div>

    </blockquote>

    <br>

  </body>

</html>