[llvm] r314435 - [JumpThreading] Preserve DT and LVI across the pass

Fri Oct 13 13:44:41 PDT 2017

Ah gotcha, that makes sense.  As for the basic updater I would appreciate
to see your prototype. We can discuss this in person at the LLVM conference
too.

Thanks Daniel.
-Brian

On Fri, Oct 13, 2017 at 3:37 PM, Daniel Berlin <dberlin at dberlin.org> wrote:

> The problem with adding it directly to DT is that then you have to update
> PDT and DT separately with different calls for no reason.
>
> I think you would be better off creating an updater class that contains DT
> and the vector and pass it around.
> Then we can just add PDT to the updater object and everything will update
> PDT as well with no extra work.
> I'll send you a basic updater object patch if you like
>
> On Fri, Oct 13, 2017, 4:28 PM Brian M. Rzycki <brzycki at gmail.com> wrote:
>
>> Daniel, thank you for the explanation. I completely agree if we can batch
>> everything then it's a net win. Where I have concerns is in the gritty
>> details of the function calls and how they interface with each other. While
>> waiting on your response I started to look at just one place where I would
>> need to do batch updates.
>>
>> JumpThreadingPass::runImpl()
>>   llvm::removeUnreachableBlocks()
>>     static markAliveBlocks()
>>       llvm::changeToUnreachable()
>>
>> In this example I start batching DT requests for removeUnreachableBlocks.
>> Because this function lives in Local.cpp, is a public interface, and
>> removes blocks I decided this routine would isolate its own batch updates.
>> This works well until I get to the call to changeToUnreachable(). CTU
>> alters the state of the CFG and I need to know the changes for DT updates.
>> I now have the following options:
>>
>> 1. change the public api and all its call-sites
>> 2. attempt to use optional parameters and conditionally batch DT updates
>> 3. view this function call as a synchronization event and force
>> applyUpdates before calling.
>>
>> For (1) this is error-prone and opens up lots of fun debug issues. I'd
>> prefer to not go down this path.
>> For (3) we lose the whole point of doing batch updates in the first place.
>> For (2) I have to add three new optional variables: DT, a Vector
>> reference, and a flag indicating batch mode.  This will work but it's
>> clunky.
>>
>> Note that this isn't the only place I'll have to make this compromise.
>> Here's another that makes me a lot more nervous:
>> JumpThreadingPass::SPlitBlockPredecessors()
>>   SplitBlockPredecessors()
>>
>> In this case SplitBlockPredecessors() does have DT support using the
>> older API. It's defined in BasicBlockUtils.cpp. Again, I'd like to avoid
>> option (1) and doing option (2) means I have to update the DT inside the
>> routine to the new API as well as pass 2 optional parameters. In this case
>> (3) makes the most sense, but I have no idea at what performance cost.
>>
>> I am willing to do the work but it's going to take time (and a bit of
>> guidance from others on how they want this interface to involve).
>>
>> There's another issue: every edge that is updated via batch has to be
>> checked to make sure it's not already in the list of updates. This means as
>> I insert into Updates I have to do the find() != idiom every edge. This is
>> O(n). As the size of the batch list gets larger that time becomes less
>> trivial.
>>
>> Reasons like these are why I was suggesting an alternative: have the
>> pending queue of requests live inside the DominatorTree class and use
>> methods on DT to enqueue or flush. It also has the benefit of being useful
>> to others outside of JumpThreading and Local that will eventually want to
>> use the new DT interface. Any call to methods other than the following
>> automatically cause a flush of the pending queue:
>> deleteEdgePend()
>> insertEdgePend()
>> applyUpdatesPend()
>>
>> It would also be nice if the pending inserts didn't care about duplicate
>> edge update requests. That would make the call site much simpler and move
>> the work to a single scrub of the list just before processing it.
>>
>> Guidance on how to proceed is greatly appreciated. Frankly I've been
>> fighting dominators inb JT now for about 4 months and I really just want to
>> move onto the more productive work of improving the code generation in
>> JumpThreading. :)
>>
>> Thanks,
>> -Brian
>>
>>
>> On Fri, Oct 13, 2017 at 2:38 PM, Daniel Berlin <dberlin at dberlin.org>
>> wrote:
>>
>>>
>>>
>>> On Fri, Oct 13, 2017 at 10:48 AM, Brian M. Rzycki <brzycki at gmail.com>
>>> wrote:
>>>
>>>> I *believe* the motivation is that they want DT up to date *in-pass* so
>>>>> they can use it do to better jump threading over loops.
>>>>
>>>> Daniel, you are correct in this assumption. Dominance is the first step
>>>> in enhancing the kinds of analysis and extend the number of Jumpthreading
>>>> opportunities. Analysis of loops is part of this work.
>>>>
>>>> *that* kind of change, IMHO, should be subject to our usual "how much
>>>>> faster code does it make vs how much does it cost" tradeoff.
>>>>
>>>> I have not started the work on actual code optimizations simply because
>>>> I could not intelligently make the analysis decisions needed. Rather than
>>>> create a huge change to JT (or another pass) Sebastian and I decided to
>>>> incrementally update JT with the features we needed to get us to the end
>>>> goal.
>>>>
>>>> This was *our* plan to remove the recalculates, but it does not appear
>>>>> to have be what Sebastian/etc want for JT.
>>>>
>>>> I'm a bit confused by this statement: that's what the patch is doing.
>>>>
>>>
>>> You are confused because i was talking about the recalculations that
>>> occur elsewhere.
>>>
>>> But let me try to diagram it for you.
>>>
>>> Here is the equivalent of what JT did before your patch
>>>
>>>
>>> iterate
>>>  for each bb
>>>     maybe change an edge and some code
>>>  call some utilities
>>>  for each dead block
>>>     erase it
>>> DT->recalculate
>>>
>>> Here is what your patch would look like without the incremental API
>>> existing:
>>> iterate
>>>   for each bb
>>>    maybe change an edge and some code
>>>    for each edge we changed:
>>>      DT->recalculate
>>>   call some utilities, calling DT->recalculate in each one
>>>  for each dead block
>>>     erase it
>>>     DT->recalculate
>>>
>>> You can see that this has added a O(number of changes ) DT->recalculate
>>> calls compared to what happened before.
>>>
>>> That would be ridiculously slow. Probably 100x-1000x as slow as the
>>> first case.
>>>
>>> You then have changed that O(number of changes) set of calls to use the
>>> incremental API, so that it looks like this:
>>> iterate
>>>   for each bb
>>>    maybe change an edge and some code
>>>    for each edge we changed:
>>>      incremental update DT
>>>   call some utilities, calling incremental update DT in each one
>>>  for each dead block
>>>     erase it
>>>     incremental update DT
>>>
>>> These calls, unfortunately are not O(1) and will never be O(1) (doing so
>>> is provably impossible)
>>> So it's not an O(N) cost you've added it, it's N^2 or N^3 in the worst
>>> case.
>>>
>>>
>>> There's only one corner-case where a recalculate occurs. The rest of the
>>>> time it's incremental updates.  There may still be opportunities to batch
>>>> updates but I honestly have no idea if it will be more efficient.
>>>>
>>>
>>> It will, because it will cause less updating in practice.
>>>
>>> If jump threading changes edge A->B to A->C (IE insert new edge, remove
>>> old edge)  and then changes edge A->C to A->D , you will perform two
>>> updates.
>>> If those require full recalculation of the tree (internal to the
>>> updater), you will update the tree twice.
>>> Batching will perform one.
>>>
>>>
>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20171013/99495ee5/attachment-0001.html>