[llvm] r314435 - [JumpThreading] Preserve DT and LVI across the pass

Fri Oct 13 13:51:25 PDT 2017

Sounds good. I want to make sure you feel like you can make forward
progress as quickly as possible (I know starting to fix all this has been a
drag), so if you want to do it earlier, let me know and we'll figure
something out :)

On Fri, Oct 13, 2017 at 1:44 PM, Brian M. Rzycki <brzycki at gmail.com> wrote:

> Ah gotcha, that makes sense.  As for the basic updater I would appreciate
> to see your prototype. We can discuss this in person at the LLVM conference
> too.
>
> Thanks Daniel.
> -Brian
>
> On Fri, Oct 13, 2017 at 3:37 PM, Daniel Berlin <dberlin at dberlin.org>
> wrote:
>
>> The problem with adding it directly to DT is that then you have to update
>> PDT and DT separately with different calls for no reason.
>>
>> I think you would be better off creating an updater class that contains
>> DT and the vector and pass it around.
>> Then we can just add PDT to the updater object and everything will update
>> PDT as well with no extra work.
>> I'll send you a basic updater object patch if you like
>>
>> On Fri, Oct 13, 2017, 4:28 PM Brian M. Rzycki <brzycki at gmail.com> wrote:
>>
>>> Daniel, thank you for the explanation. I completely agree if we can
>>> batch everything then it's a net win. Where I have concerns is in the
>>> gritty details of the function calls and how they interface with each
>>> other. While waiting on your response I started to look at just one place
>>> where I would need to do batch updates.
>>>
>>> JumpThreadingPass::runImpl()
>>>   llvm::removeUnreachableBlocks()
>>>     static markAliveBlocks()
>>>       llvm::changeToUnreachable()
>>>
>>> In this example I start batching DT requests for
>>> removeUnreachableBlocks. Because this function lives in Local.cpp, is a
>>> public interface, and removes blocks I decided this routine would isolate
>>> its own batch updates. This works well until I get to the call to
>>> changeToUnreachable(). CTU alters the state of the CFG and I need to know
>>> the changes for DT updates. I now have the following options:
>>>
>>> 1. change the public api and all its call-sites
>>> 2. attempt to use optional parameters and conditionally batch DT updates
>>> 3. view this function call as a synchronization event and force
>>> applyUpdates before calling.
>>>
>>> For (1) this is error-prone and opens up lots of fun debug issues. I'd
>>> prefer to not go down this path.
>>> For (3) we lose the whole point of doing batch updates in the first
>>> place.
>>> For (2) I have to add three new optional variables: DT, a Vector
>>> reference, and a flag indicating batch mode.  This will work but it's
>>> clunky.
>>>
>>> Note that this isn't the only place I'll have to make this compromise.
>>> Here's another that makes me a lot more nervous:
>>> JumpThreadingPass::SPlitBlockPredecessors()
>>>   SplitBlockPredecessors()
>>>
>>> In this case SplitBlockPredecessors() does have DT support using the
>>> older API. It's defined in BasicBlockUtils.cpp. Again, I'd like to avoid
>>> option (1) and doing option (2) means I have to update the DT inside the
>>> routine to the new API as well as pass 2 optional parameters. In this case
>>> (3) makes the most sense, but I have no idea at what performance cost.
>>>
>>> I am willing to do the work but it's going to take time (and a bit of
>>> guidance from others on how they want this interface to involve).
>>>
>>> There's another issue: every edge that is updated via batch has to be
>>> checked to make sure it's not already in the list of updates. This means as
>>> I insert into Updates I have to do the find() != idiom every edge. This is
>>> O(n). As the size of the batch list gets larger that time becomes less
>>> trivial.
>>>
>>> Reasons like these are why I was suggesting an alternative: have the
>>> pending queue of requests live inside the DominatorTree class and use
>>> methods on DT to enqueue or flush. It also has the benefit of being useful
>>> to others outside of JumpThreading and Local that will eventually want to
>>> use the new DT interface. Any call to methods other than the following
>>> automatically cause a flush of the pending queue:
>>> deleteEdgePend()
>>> insertEdgePend()
>>> applyUpdatesPend()
>>>
>>> It would also be nice if the pending inserts didn't care about duplicate
>>> edge update requests. That would make the call site much simpler and move
>>> the work to a single scrub of the list just before processing it.
>>>
>>> Guidance on how to proceed is greatly appreciated. Frankly I've been
>>> fighting dominators inb JT now for about 4 months and I really just want to
>>> move onto the more productive work of improving the code generation in
>>> JumpThreading. :)
>>>
>>> Thanks,
>>> -Brian
>>>
>>>
>>> On Fri, Oct 13, 2017 at 2:38 PM, Daniel Berlin <dberlin at dberlin.org>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Fri, Oct 13, 2017 at 10:48 AM, Brian M. Rzycki <brzycki at gmail.com>
>>>> wrote:
>>>>
>>>>> I *believe* the motivation is that they want DT up to date *in-pass*
>>>>>> so they can use it do to better jump threading over loops.
>>>>>
>>>>> Daniel, you are correct in this assumption. Dominance is the first
>>>>> step in enhancing the kinds of analysis and extend the number of
>>>>> Jumpthreading opportunities. Analysis of loops is part of this work.
>>>>>
>>>>> *that* kind of change, IMHO, should be subject to our usual "how much
>>>>>> faster code does it make vs how much does it cost" tradeoff.
>>>>>
>>>>> I have not started the work on actual code optimizations simply
>>>>> because I could not intelligently make the analysis decisions needed.
>>>>> Rather than create a huge change to JT (or another pass) Sebastian and I
>>>>> decided to incrementally update JT with the features we needed to get us to
>>>>> the end goal.
>>>>>
>>>>> This was *our* plan to remove the recalculates, but it does not appear
>>>>>> to have be what Sebastian/etc want for JT.
>>>>>
>>>>> I'm a bit confused by this statement: that's what the patch is doing.
>>>>>
>>>>
>>>> You are confused because i was talking about the recalculations that
>>>> occur elsewhere.
>>>>
>>>> But let me try to diagram it for you.
>>>>
>>>> Here is the equivalent of what JT did before your patch
>>>>
>>>>
>>>> iterate
>>>>  for each bb
>>>>     maybe change an edge and some code
>>>>  call some utilities
>>>>  for each dead block
>>>>     erase it
>>>> DT->recalculate
>>>>
>>>> Here is what your patch would look like without the incremental API
>>>> existing:
>>>> iterate
>>>>   for each bb
>>>>    maybe change an edge and some code
>>>>    for each edge we changed:
>>>>      DT->recalculate
>>>>   call some utilities, calling DT->recalculate in each one
>>>>  for each dead block
>>>>     erase it
>>>>     DT->recalculate
>>>>
>>>> You can see that this has added a O(number of changes ) DT->recalculate
>>>> calls compared to what happened before.
>>>>
>>>> That would be ridiculously slow. Probably 100x-1000x as slow as the
>>>> first case.
>>>>
>>>> You then have changed that O(number of changes) set of calls to use the
>>>> incremental API, so that it looks like this:
>>>> iterate
>>>>   for each bb
>>>>    maybe change an edge and some code
>>>>    for each edge we changed:
>>>>      incremental update DT
>>>>   call some utilities, calling incremental update DT in each one
>>>>  for each dead block
>>>>     erase it
>>>>     incremental update DT
>>>>
>>>> These calls, unfortunately are not O(1) and will never be O(1) (doing
>>>> so is provably impossible)
>>>> So it's not an O(N) cost you've added it, it's N^2 or N^3 in the worst
>>>> case.
>>>>
>>>>
>>>> There's only one corner-case where a recalculate occurs. The rest of
>>>>> the time it's incremental updates.  There may still be opportunities to
>>>>> batch updates but I honestly have no idea if it will be more efficient.
>>>>>
>>>>
>>>> It will, because it will cause less updating in practice.
>>>>
>>>> If jump threading changes edge A->B to A->C (IE insert new edge, remove
>>>> old edge)  and then changes edge A->C to A->D , you will perform two
>>>> updates.
>>>> If those require full recalculation of the tree (internal to the
>>>> updater), you will update the tree twice.
>>>> Batching will perform one.
>>>>
>>>>
>>>>
>>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20171013/3c7f66d5/attachment.html>