[llvm-dev] [RFC][InlineCost] Modeling JumpThreading (or similar) in inline cost model

Fri Aug 4 12:15:15 PDT 2017

On 8/4/2017 2:55 PM, Daniel Berlin wrote:
>
>
> On Fri, Aug 4, 2017 at 11:41 AM, Chad Rosier <mcrosier at codeaurora.org 
> <mailto:mcrosier at codeaurora.org>> wrote:
>
>
>
>     On 8/4/2017 2:06 PM, Daniel Berlin wrote:
>>     A few notes:
>>     I'm a bit surprised IPO copy/constant propagation doesn't get
>>     this case, but i didn't look if the lattice supports variables.
>>     In particular, in your example, given no other call sites, it
>>     should eliminate the dead code.
>>     (In a real program, it may require cloning).
>
>     In the actual program (SPEC2017/gcc, ironically), there are
>     multiple calls to fn2 and only one of them has the property that
>     the 1st and 2nd argument are the same (as is shown in my pseudo
>     code). Internally, we have another developer, Matt Simpson,
>     working on a function specialization patch that might be of value
>     here.  Specifically, you could clone fn2 based on the fact that
>     a_ptr == dst_ptr and then simplify a great deal of the function. 
>     However, that patch is still a WIP.
>
>
> FWIW: You almost certainly want to integrate that with IPA based 
> constant propagation, as it is the thing you should be using to tell 
> you what will happen in the call. It should actually not be difficult 
> at all (I can give you references to papers, but it's just a couple 
> hundred lines of code on top of our current propagation engine)

If I'm not mistaken, this is exactly what Matt is doing. :D
>
> (It can also later be used to do type-base devirt).
>
> GCC will do partial specialization (IE contextually decide to clone 
> callsites where it believes the constantness/etc will cause elimination)
>
>
>
>>     GCC will do IPA-CP/const-prop with cloning, and i'm wildly
>>     curious if new GCC's catch this case for you at higher
>>     optimization levels?
>
>     GCC does inline fn2 into fn1 in this particular case, but I'm not
>     exactly sure how GCC accomplishes this.  I'm guessing GCC is just
>     more aggressive with its inlining (fn2 is also marked with the
>     inline keyword, which I assume GCC uses as a hint).  I'm
>     speculating here and I've never worked on GCC, so unfortunately I
>     have little to go on.
>
>
> I meant if you turn off inlining :)
Doh. Right.

>
> I can take a gander though, given the info you've given.
>
>
>
>
>>     If so, it may be worth not looking at this as an inlining
>>     problem, but as an area we need IPO infrastructure improvement
>
>     Because of the multiple callsites with varying characteristics I'm
>     not sure this can be solved in this way.
>
>
> FWIW: It definitely can.  Whether we want to, ....
Yes, you're right.

>
> That said, this is the whole purpose of IPA const/copy prop.
> LLVM stops at the "propagate things that are always constant in every 
> case" whereas, most compilers do "if worth it, clone this function 
> callsite where i can prove it will be constant ".
>
> You probably not want to rely on inlining for all possible IPO 
> effects.  Especially in this case, where IPO can actually do the job.
Yes, that makes sense.

>
> IMHO, if you can, you want to reserve inlining-as-IPO for the cases 
> where the IPO algorithms are difficult/expensive/etc.
>
> Or, at the very least, drive inlining like this by the results of 
> those IPO algorithms.

Sounds reasonable.  I'll throw this idea around with Matt and Geoff.

>
>
>
>>
>>     Otherwise,  a couple things:
>>     Approximate dominators (for example, semi-dominators) can be
>>     computed fast (a DFS walk of the CFG with no real additional
>>     computation)
>>     Except in strange CFGs that jump around a lot, they are the
>>     dominators.
>>
>>     More importantly, the dominator is either the sdom or a proper
>>     ancestor of the sdom.
>>
>>     The practical impact of this is that if you use them as if they
>>     were dominators, the set of conditions you discover will not be
>>     "too wrong".  Occasionally wrong, but mostly not.
>>
>>     My guess is the cost of doing approximate dominators is ~50-60%
>>     of the cost of doing dominators.  Nowadays, about half the time
>>     was in the DFS walk, the other half in the computation. At best,
>>     it would be 2-3x faster.
>>     I've no idea if this changes whether we'd want dominators,
>>     approximate dominators, or stick with nothing.
>
>     Right, this is kinda one of the bigger questions I'm trying to
>     figure out.  My proposed solution doesn't use the dominator tree
>     in order to minimize the impact on compile-time.  However, I'd
>     guess the ROI is going to be much smaller because of the limited
>     scope.  On the other end of the spectrum I'd fear the full
>     dominator tree would be too computationally expensive (but of
>     course some of that could be mitigated by the ability to do
>     incremental updates to the dominator tree).
>
>>
>>     If you have some sort of dominatorish tree could then just use
>>     earlycse's method of dominating condition finding:
>>     Process in "dom tree" top-down order, push the equivalences you
>>     see, pop the relevant ones when you exit the relevant dom tree scope.
>>
>>     In practice, you'd only check comparisons against the hash table.
>
>     Humm.. I'll have to think about it for a bit. I'm thinking this
>     might be a good compromise for my needs.
>
>>
>>     The other option is PredicateInfo, but it requires dominators and
>>     modifies the IR.
>>
>>     My guess is this is undesired/too heavyweight for inline cost
>>     analysis, however the basic principle on how it renames things
>>     could also be applied without IR changing for this specific
>>     case.  Unlike the EarlyCSE method, which is O(all instructons)
>>     PredicateInfo is O(branches + number of uses of variables
>>     affected by conditions)  Without going into futher details, if
>>     all you care about is "for each condition, give me the set of
>>     possibly affected variables" (so you can see if they may
>>     simplify), we could do that very very quickly (as fast as we can
>>     sort a vector). But it does require dominators.
>
>     For my particular problem, I think PredicateInfo would be
>     sufficient IIUYC.  But as you suggest, I'm thinking people aren't
>     going to be fond of using the full dominators.
>
>     Lots of great feedback.  Thanks, Danny.
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170804/24f3f4dd/attachment.html>