[llvm] r314435 - [JumpThreading] Preserve DT and LVI across the pass

Fri Oct 13 11:05:21 PDT 2017

>
> That may actually be fast enough already if you do that, which i'm hoping
> (but you know better than I) is not a lot more work than you have here.

There are a couple of complications with this method as I think about it.
The first is the existence of DT used outside of the Jumpthreading pass:
Local.cpp and other helper functions like the ones in BasicBlockUtils.cpp.
I'll have to flush before calling these sites, or rework the interfaces to
them to have an optional vector passed around. The hand-off between these
separate functions will require me figuring out the most efficient way of
updating the DT.

The second is I haven't done the analysis of exactly where I need the DT
analysis preserved within the JT pass. Part of the reason we did the work
this way was so we didn't have to worry about the state of DT, we could
just use it at any point. This proposal changes it to be a much more
explicit sync-then-analysis event.

I suspect both of these will cause more frequent than "block is deleted so
flush DT" style code. It might be more efficient to batch the updates, I'm
not sure.

Finally, I'd need a way to detect if the root block is to be removed from
the function: in that case I have to recalculate the DT.

If this is the recommendation going forward I can start on this work.

On Fri, Oct 13, 2017 at 12:44 PM, Daniel Berlin <dberlin at dberlin.org> wrote:

> and just because i don't think i said it:
> I know it is a large amount of work to get it to this point at all, and I
> am *very* glad to see this happening.
> I think to get it over the finish line, unfortunately, we will have to
> choose "at what point do we actually need DT up to date" based on cost
> tradeoffs, , and make batching work for doing that.
>
> The good news is "it should already work except where blocks are
> completely removed"
>
> IE you should already be able to defer *all* updates to the point a block
> is removed, and only call applyUpdates when a block is erased.
>
> That may actually be fast enough already if you do that, which i'm hoping
> (but you know better than I) is not a lot more work than you have here.
>
> If you want to cut down on plumbing, i know all these things have DT
> anyway.
> We could have DT be able to hand out the updater itself as a singleton.
>
> (Otherwise, we should encapsulate the update vector instead of passing it
> directly. It probably only needs to take a DT and have functions called
> addEdgeUpdate and applyUpdates)
>
>
>
>
>
>
>
> On Fri, Oct 13, 2017 at 10:31 AM, Daniel Berlin <dberlin at dberlin.org>
> wrote:
>
>>
>>
>> On Fri, Oct 13, 2017 at 9:41 AM, Brian M. Rzycki <brzycki at gmail.com>
>> wrote:
>>
>>> Hi Daniel, thanks for coming back to the conversation.
>>>
>>>
>>> > In that sense, 4% is really really good.  You are saving probably
>>> > 100's or 1000's of dominator tree rebuilds to get the equivalent
>>> preservation.
>>> One of my first attempts at preserving DT across JT was to simply use
>>> recalculate(). It was terribly slow but helped me to identify the areas
>>> where DT needed to be properly preserved with the new API.
>>>
>>> > This is being done presumably to enable some better optimization.
>>> Correct. The JT pass in LLVM misses opportunities GCC catches. Two of
>>> the areas Sebastian and I would like to explore are threaded jumps across
>>> loop boundaries and 2 (or more) blocks threaded in one pass. DT is
>>> essential to adding LI analysis preservation to detect loops for jumping.
>>>
>>> > The only other option is to batch more (which i believe should just
>>> work.
>>> > You should be able to keep a more global update vector, and apply less,
>>> > and it should just work)
>>> The jumpthreading pass makes heavy use of functions in Local.cpp and a
>>> few other calls that update DT like the ones in BasicBlockUtils. It would
>>> take coordination between these different call sites to use the global data
>>> structure.
>>>
>>
>> Or an updater object :)
>>
>>
>>> If there was some way to "defer" updates in the DT interface that may
>>> help mitigate some of this overhead. Something like:
>>>
>>> DT->deleteEdgeDeferred(BB, BBFoo);
>>> ...
>>> DT->deferredUpdates(UpdateVector);
>>> ...
>>> DT->applyDeferredUpdates();
>>>
>>
>> This should already work.
>> The batch updater should be able to apply the updates whenever you want.
>> It should not require they be applied in lockstep with the CFG.
>>
>> The only requirement, i believe, is that the CFG nodes themselves (not
>> the edges) don't go completely away because it literally needs to be able
>> to query them without use after free (so it can, for example, get the DT
>> node for a CFG node).
>> This seems fixable to me.
>> If you have an updater object, you use WeakVH for the CFG nodes so they
>> go away without use after free, and Updater.add stores a pointer to the DT
>> node so we can call deleteunreachable on it (by definition, any completely
>> erased cfg node must be unreachable, or else you are going to have a bad
>> time anyway).
>>
>>
>>
>>> It would also have the benefit if people use the older API then DT will
>>> know to implicitly applyDeferredUpdates before executing the older
>>> functions.
>>>
>>>
>> I suggested this interface at one point to Kuba, but it turns out there
>> are things that directly access the bits of DT that make this likely to
>> fail as an interface, unfortunately.
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20171013/74dc120d/attachment.html>