[llvm-dev] [RFC][InlineCost] Modeling JumpThreading (or similar) in inline cost model
Chad Rosier via llvm-dev
llvm-dev at lists.llvm.org
Fri Aug 4 12:15:15 PDT 2017
On 8/4/2017 2:55 PM, Daniel Berlin wrote:
>
>
> On Fri, Aug 4, 2017 at 11:41 AM, Chad Rosier <mcrosier at codeaurora.org
> <mailto:mcrosier at codeaurora.org>> wrote:
>
>
>
> On 8/4/2017 2:06 PM, Daniel Berlin wrote:
>> A few notes:
>> I'm a bit surprised IPO copy/constant propagation doesn't get
>> this case, but i didn't look if the lattice supports variables.
>> In particular, in your example, given no other call sites, it
>> should eliminate the dead code.
>> (In a real program, it may require cloning).
>
> In the actual program (SPEC2017/gcc, ironically), there are
> multiple calls to fn2 and only one of them has the property that
> the 1st and 2nd argument are the same (as is shown in my pseudo
> code). Internally, we have another developer, Matt Simpson,
> working on a function specialization patch that might be of value
> here. Specifically, you could clone fn2 based on the fact that
> a_ptr == dst_ptr and then simplify a great deal of the function.
> However, that patch is still a WIP.
>
>
> FWIW: You almost certainly want to integrate that with IPA based
> constant propagation, as it is the thing you should be using to tell
> you what will happen in the call. It should actually not be difficult
> at all (I can give you references to papers, but it's just a couple
> hundred lines of code on top of our current propagation engine)
If I'm not mistaken, this is exactly what Matt is doing. :D
>
> (It can also later be used to do type-base devirt).
>
> GCC will do partial specialization (IE contextually decide to clone
> callsites where it believes the constantness/etc will cause elimination)
>
>
>
>> GCC will do IPA-CP/const-prop with cloning, and i'm wildly
>> curious if new GCC's catch this case for you at higher
>> optimization levels?
>
> GCC does inline fn2 into fn1 in this particular case, but I'm not
> exactly sure how GCC accomplishes this. I'm guessing GCC is just
> more aggressive with its inlining (fn2 is also marked with the
> inline keyword, which I assume GCC uses as a hint). I'm
> speculating here and I've never worked on GCC, so unfortunately I
> have little to go on.
>
>
> I meant if you turn off inlining :)
Doh. Right.
>
> I can take a gander though, given the info you've given.
>
>
>
>
>> If so, it may be worth not looking at this as an inlining
>> problem, but as an area we need IPO infrastructure improvement
>
> Because of the multiple callsites with varying characteristics I'm
> not sure this can be solved in this way.
>
>
> FWIW: It definitely can. Whether we want to, ....
Yes, you're right.
>
> That said, this is the whole purpose of IPA const/copy prop.
> LLVM stops at the "propagate things that are always constant in every
> case" whereas, most compilers do "if worth it, clone this function
> callsite where i can prove it will be constant ".
>
> You probably not want to rely on inlining for all possible IPO
> effects. Especially in this case, where IPO can actually do the job.
Yes, that makes sense.
>
> IMHO, if you can, you want to reserve inlining-as-IPO for the cases
> where the IPO algorithms are difficult/expensive/etc.
>
> Or, at the very least, drive inlining like this by the results of
> those IPO algorithms.
Sounds reasonable. I'll throw this idea around with Matt and Geoff.
>
>
>
>>
>> Otherwise, a couple things:
>> Approximate dominators (for example, semi-dominators) can be
>> computed fast (a DFS walk of the CFG with no real additional
>> computation)
>> Except in strange CFGs that jump around a lot, they are the
>> dominators.
>>
>> More importantly, the dominator is either the sdom or a proper
>> ancestor of the sdom.
>>
>> The practical impact of this is that if you use them as if they
>> were dominators, the set of conditions you discover will not be
>> "too wrong". Occasionally wrong, but mostly not.
>>
>> My guess is the cost of doing approximate dominators is ~50-60%
>> of the cost of doing dominators. Nowadays, about half the time
>> was in the DFS walk, the other half in the computation. At best,
>> it would be 2-3x faster.
>> I've no idea if this changes whether we'd want dominators,
>> approximate dominators, or stick with nothing.
>
> Right, this is kinda one of the bigger questions I'm trying to
> figure out. My proposed solution doesn't use the dominator tree
> in order to minimize the impact on compile-time. However, I'd
> guess the ROI is going to be much smaller because of the limited
> scope. On the other end of the spectrum I'd fear the full
> dominator tree would be too computationally expensive (but of
> course some of that could be mitigated by the ability to do
> incremental updates to the dominator tree).
>
>>
>> If you have some sort of dominatorish tree could then just use
>> earlycse's method of dominating condition finding:
>> Process in "dom tree" top-down order, push the equivalences you
>> see, pop the relevant ones when you exit the relevant dom tree scope.
>>
>> In practice, you'd only check comparisons against the hash table.
>
> Humm.. I'll have to think about it for a bit. I'm thinking this
> might be a good compromise for my needs.
>
>>
>> The other option is PredicateInfo, but it requires dominators and
>> modifies the IR.
>>
>> My guess is this is undesired/too heavyweight for inline cost
>> analysis, however the basic principle on how it renames things
>> could also be applied without IR changing for this specific
>> case. Unlike the EarlyCSE method, which is O(all instructons)
>> PredicateInfo is O(branches + number of uses of variables
>> affected by conditions) Without going into futher details, if
>> all you care about is "for each condition, give me the set of
>> possibly affected variables" (so you can see if they may
>> simplify), we could do that very very quickly (as fast as we can
>> sort a vector). But it does require dominators.
>
> For my particular problem, I think PredicateInfo would be
> sufficient IIUYC. But as you suggest, I'm thinking people aren't
> going to be fond of using the full dominators.
>
> Lots of great feedback. Thanks, Danny.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170804/24f3f4dd/attachment.html>
More information about the llvm-dev
mailing list