<div dir="ltr">A few notes:<br>I'm a bit surprised IPO copy/constant propagation doesn't get this case, but i didn't look if the lattice supports variables.<div>In particular, in your example, given no other call sites, it should eliminate the dead code.</div><div>(In a real program, it may require cloning).</div><div><br></div><div>GCC will do IPA-CP/const-prop with cloning, and i'm wildly curious if new GCC's catch this case for you at higher optimization levels?</div><div><br></div><div>If so, it may be worth not looking at this as an inlining problem, but as an area we need IPO infrastructure improvement </div><div><div><br></div><div>Otherwise,  a couple things:<div><div>Approximate dominators (for example, semi-dominators) can be computed fast (a DFS walk of the CFG with no real additional computation)</div><div>Except in strange CFGs that jump around a lot, they are the dominators.</div><div><br></div><div>More importantly, the dominator is either the sdom or a proper ancestor of the sdom.</div><div><br></div><div>The practical impact of this is that if you use them as if they were dominators, the set of conditions you discover will not be "too wrong".  Occasionally wrong, but mostly not.</div><div><br></div><div>My guess is the cost of doing approximate dominators is ~50-60% of the cost of doing dominators.  Nowadays, about half the time was in the DFS walk, the other half in the computation. At best, it would be 2-3x faster.</div><div>I've no idea if this changes whether we'd want dominators, approximate dominators, or stick with nothing.</div><div><br></div><div>If you have some sort of dominatorish tree could then just use earlycse's method of dominating condition finding:</div><div>Process in "dom tree" top-down order, push the equivalences you see, pop the relevant ones when you exit the relevant dom tree scope.</div><div><br></div><div>In practice, you'd only check comparisons against the hash table.</div><div><br></div><div>The other option is PredicateInfo, but it requires dominators and modifies the IR.</div><div><br></div><div>My guess is this is undesired/too heavyweight for inline cost analysis, however the basic principle on how it renames things could also be applied without IR changing for this specific case.  Unlike the EarlyCSE method, which is O(all instructons) PredicateInfo is O(branches + number of uses of variables affected by conditions)  Without going into futher details, if all you care about is "for each condition, give me the set of possibly affected variables" (so you can see if they may simplify), we could do that very very quickly (as fast as we can sort a vector). But it does require dominators.</div><div><br></div><div><br></div><div> <br></div></div></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Aug 4, 2017 at 8:56 AM, Chad Rosier <span dir="ltr"><<a href="mailto:mcrosier@codeaurora.org" target="_blank">mcrosier@codeaurora.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  
  <div text="#000000" bgcolor="#FFFFFF">

    <p><tt>All,</tt></p>

    <p><tt>I'm working on an improvement to the inline cost model, but

        I'm unsure how to proceed.  Let me begin by first describing the

        problem I'm trying to solve.  Consider the following pseudo C

        code:</tt></p>

    <p><b><font face="Courier New, Courier, monospace">typedef struct

          element {<br>

            unsigned idx;<br>

          } element_t;<br>

        </font></b></p>

    <p><b><font face="Courier New, Courier, monospace">static inline<br>

          unsigned char fn2 (element_t *dst_ptr, const element_t *a_ptr,<br>

                             const element_t *b_ptr, unsigned char

          changed) {<br>

            if (a_ptr && b_ptr && a_ptr->idx ==

          b_ptr->idx) {<br>

              if (!changed && dst_ptr && dst_ptr->idx

          == a_ptr->idx) {<br>

                /* Do something */<br>

              } else {<br>

                changed = 1;<br>

                if (!dst_ptr)<br>

                  dst_ptr = fn3();<br>

                else<br>

                  dst_ptr->idx = a_ptr->idx;<br>

                /* Do something. */<br>

              }<br>

            } else {<br>

              changed = fn4();<br>

            }<br>

            return changed;<br>

          }<br>

          <br>

          unsigned char fn1 (element_t *a_ptr, element_t *b_ptr) {<br>

            unsigned char changed = 0;<br>

            while (b_ptr) {<br>

              if (!a_ptr || a_ptr->idx == b_ptr->idx) {<br>

                changed = fn2 (a_ptr, a_ptr, b_ptr, changed);<br>

                b_ptr = b_ptr->next;<br>

              }<br>

            }<br>

            return changed;<br>

          }</font></b></p>

    <p><tt>When the inline cost model computes the inline cost of fn2 it

        ends up being much higher than the inline threshold.  A fair

        amount of the cost is due to the inlining of fn3 into fn2. 

        However, if fn2 had been inlined into fn1 the code from fn3

        would have been removed as dead/unreachable.<br>

      </tt></p>

    <p><tt>At the fn2 call site notice the first two arguments are

        equal.  Thus, in the context of fn2 dst_ptr and a_ptr are

        equal.  The call site of fn3 is predicated on dst_ptr being null

        (i.e., if (!dst_ptr) dst_ptr = fn3()), but that code is

        predicated on a_ptr being non-null.  Therefore, we know the condition

        !dst_ptr is false (because </tt><tt><tt>a_ptr == dst_ptr and </tt>a_ptr

        is non-null) and the call to fn3 is dead.  I suspect one of

        JumpThreading, EarlyCSE, or GVN does the elimination after inlining,

        so that's what I'd like to try and model in the inline cost

        model.  (Note fn2 has multiple call sides and the property that

        the first and second arguments are equal isn't true for each

        call site, so something like IPSCCP doesn't actually help,

        AFAICT).</tt></p>

    <p><tt>My first attempt at solving this problem did something similar

        to what is done in

        JumpThreadingPass::<wbr>ProcessImpliedCondition().  Specifically, it

        tried to prove that dst_ptr was non-null based on a dominating

        condition.  The only tricky parts were to deal with

        hammocks/diamonds when walking up the CFG (</tt><tt><tt>See:

          <a class="m_3626320905672759108moz-txt-link-freetext" href="https://reviews.llvm.org/D36287" target="_blank">https://reviews.llvm.org/<wbr>D36287</a> as a concrete example of how I

          proposed to get an immediate dominator without the domtree</tt>)

        and to account for the fact that dst_ptr and a_ptr are equal.</tt></p>

    <p><tt>I'm pretty sure I can get this approach to work, however, I'm

        not convinced it's really extensible or general.  Should we

        consider using the full dominator tree in the inline cost model

        to capture this?<br>

      </tt></p>

    <p><tt>If you have any thoughts on how to tackle the problem, I

        would love to hear your feedback!</tt></p><span class="HOEnZb"><font color="#888888">

    <p><tt> Chad<br>

      </tt></p>

  </font></span></div>


</blockquote></div><br></div>