[llvm-dev] [RFC][InlineCost] Modeling JumpThreading (or similar) in inline cost model

Sean Silva via llvm-dev llvm-dev at lists.llvm.org
Fri Aug 4 15:07:27 PDT 2017


On Fri, Aug 4, 2017 at 11:41 AM, Chad Rosier via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

>
>
> On 8/4/2017 2:06 PM, Daniel Berlin wrote:
>
> A few notes:
> I'm a bit surprised IPO copy/constant propagation doesn't get this case,
> but i didn't look if the lattice supports variables.
> In particular, in your example, given no other call sites, it should
> eliminate the dead code.
> (In a real program, it may require cloning).
>
>
> In the actual program (SPEC2017/gcc, ironically), there are multiple calls
> to fn2 and only one of them has the property that the 1st and 2nd argument
> are the same (as is shown in my pseudo code).  Internally, we have another
> developer, Matt Simpson, working on a function specialization patch that
> might be of value here.  Specifically, you could clone fn2 based on the
> fact that a_ptr == dst_ptr and then simplify a great deal of the function.
> However, that patch is still a WIP.
>
> GCC will do IPA-CP/const-prop with cloning, and i'm wildly curious if new
> GCC's catch this case for you at higher optimization levels?
>
>
> GCC does inline fn2 into fn1 in this particular case, but I'm not exactly
> sure how GCC accomplishes this.  I'm guessing GCC is just more aggressive
> with its inlining (fn2 is also marked with the inline keyword, which I
> assume GCC uses as a hint).  I'm speculating here and I've never worked on
> GCC, so unfortunately I have little to go on.
>
> If so, it may be worth not looking at this as an inlining problem, but as
> an area we need IPO infrastructure improvement
>
>
> Because of the multiple callsites with varying characteristics I'm not
> sure this can be solved in this way.
>
>
> Otherwise,  a couple things:
> Approximate dominators (for example, semi-dominators) can be computed fast
> (a DFS walk of the CFG with no real additional computation)
> Except in strange CFGs that jump around a lot, they are the dominators.
>
> More importantly, the dominator is either the sdom or a proper ancestor of
> the sdom.
>
> The practical impact of this is that if you use them as if they were
> dominators, the set of conditions you discover will not be "too wrong".
> Occasionally wrong, but mostly not.
>
> My guess is the cost of doing approximate dominators is ~50-60% of the
> cost of doing dominators.  Nowadays, about half the time was in the DFS
> walk, the other half in the computation. At best, it would be 2-3x faster.
> I've no idea if this changes whether we'd want dominators, approximate
> dominators, or stick with nothing.
>
>
> Right, this is kinda one of the bigger questions I'm trying to figure
> out.  My proposed solution doesn't use the dominator tree in order to
> minimize the impact on compile-time.
>

In the new PM, it's possible for a CGSCC pass to get a cached function
analysis from functions that have been visited previously. This requires
the CGSCC iteration on SCC's we visited earlier to keep an up to date
domtree, and I don't know if that's the case (or how much work it would be
to make it the case).

-- Sean Silva


> However, I'd guess the ROI is going to be much smaller because of the
> limited scope.  On the other end of the spectrum I'd fear the full
> dominator tree would be too computationally expensive (but of course some
> of that could be mitigated by the ability to do incremental updates to the
> dominator tree).
>
>
> If you have some sort of dominatorish tree could then just use earlycse's
> method of dominating condition finding:
> Process in "dom tree" top-down order, push the equivalences you see, pop
> the relevant ones when you exit the relevant dom tree scope.
>
> In practice, you'd only check comparisons against the hash table.
>
>
> Humm.. I'll have to think about it for a bit.  I'm thinking this might be
> a good compromise for my needs.
>
>
> The other option is PredicateInfo, but it requires dominators and modifies
> the IR.
>
> My guess is this is undesired/too heavyweight for inline cost analysis,
> however the basic principle on how it renames things could also be applied
> without IR changing for this specific case.  Unlike the EarlyCSE method,
> which is O(all instructons) PredicateInfo is O(branches + number of uses of
> variables affected by conditions)  Without going into futher details, if
> all you care about is "for each condition, give me the set of possibly
> affected variables" (so you can see if they may simplify), we could do that
> very very quickly (as fast as we can sort a vector). But it does require
> dominators.
>
>
> For my particular problem, I think PredicateInfo would be sufficient
> IIUYC.  But as you suggest, I'm thinking people aren't going to be fond of
> using the full dominators.
>
> Lots of great feedback.  Thanks, Danny.
>
>
>
>
>
> On Fri, Aug 4, 2017 at 8:56 AM, Chad Rosier <mcrosier at codeaurora.org>
> wrote:
>
>> All,
>>
>> I'm working on an improvement to the inline cost model, but I'm unsure
>> how to proceed.  Let me begin by first describing the problem I'm trying to
>> solve.  Consider the following pseudo C code:
>>
>>
>>
>>
>> *typedef struct element {   unsigned idx; } element_t; *
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *static inline unsigned char fn2 (element_t *dst_ptr, const element_t
>> *a_ptr,                    const element_t *b_ptr, unsigned char changed) {
>>   if (a_ptr && b_ptr && a_ptr->idx == b_ptr->idx) {     if (!changed &&
>> dst_ptr && dst_ptr->idx == a_ptr->idx) {       /* Do something */     }
>> else {       changed = 1;       if (!dst_ptr)         dst_ptr = fn3();
>>       else         dst_ptr->idx = a_ptr->idx;       /* Do something. */
>> }   } else {     changed = fn4();   }   return changed; } unsigned char fn1
>> (element_t *a_ptr, element_t *b_ptr) {   unsigned char changed = 0;   while
>> (b_ptr) {     if (!a_ptr || a_ptr->idx == b_ptr->idx) {       changed = fn2
>> (a_ptr, a_ptr, b_ptr, changed);       b_ptr = b_ptr->next;     }   }
>> return changed; }*
>>
>> When the inline cost model computes the inline cost of fn2 it ends up
>> being much higher than the inline threshold.  A fair amount of the cost is
>> due to the inlining of fn3 into fn2.  However, if fn2 had been inlined into
>> fn1 the code from fn3 would have been removed as dead/unreachable.
>>
>> At the fn2 call site notice the first two arguments are equal.  Thus, in
>> the context of fn2 dst_ptr and a_ptr are equal.  The call site of fn3 is
>> predicated on dst_ptr being null (i.e., if (!dst_ptr) dst_ptr = fn3()), but
>> that code is predicated on a_ptr being non-null.  Therefore, we know the
>> condition !dst_ptr is false (because a_ptr == dst_ptr and a_ptr is
>> non-null) and the call to fn3 is dead.  I suspect one of JumpThreading,
>> EarlyCSE, or GVN does the elimination after inlining, so that's what I'd
>> like to try and model in the inline cost model.  (Note fn2 has multiple
>> call sides and the property that the first and second arguments are equal
>> isn't true for each call site, so something like IPSCCP doesn't actually
>> help, AFAICT).
>>
>> My first attempt at solving this problem did something similar to what is
>> done in JumpThreadingPass::ProcessImpliedCondition().  Specifically, it
>> tried to prove that dst_ptr was non-null based on a dominating condition.
>> The only tricky parts were to deal with hammocks/diamonds when walking up
>> the CFG (See: https://reviews.llvm.org/D36287 as a concrete example of
>> how I proposed to get an immediate dominator without the domtree) and to
>> account for the fact that dst_ptr and a_ptr are equal.
>>
>> I'm pretty sure I can get this approach to work, however, I'm not
>> convinced it's really extensible or general.  Should we consider using the
>> full dominator tree in the inline cost model to capture this?
>>
>> If you have any thoughts on how to tackle the problem, I would love to
>> hear your feedback!
>>
>>  Chad
>>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170804/3f933dad/attachment.html>


More information about the llvm-dev mailing list