[PATCH] D48967: [Dominators] Convert existing passes and utils to use the DomTreeUpdater class

Tue Jul 10 09:49:13 PDT 2018

NutshellySima added inline comments.

================
Comment at: lib/Transforms/Scalar/JumpThreading.cpp:420
   LoopHeaders.clear();
-  DDT->flush();
+  DTU->getDomTree();
   LVI->enableDT();
----------------
brzycki wrote:
> brzycki wrote:
> > NutshellySima wrote:
> > > NutshellySima wrote:
> > > > brzycki wrote:
> > > > > NutshellySima wrote:
> > > > > > brzycki wrote:
> > > > > > > NutshellySima wrote:
> > > > > > > > brzycki wrote:
> > > > > > > > > It's not obvious here the reasoning behind this statement. I recommend either changing this to `DTU->flush()` or adding a comment stating the call will force all updates to the internal DT/PDT trees.
> > > > > > > > I think `DTU->flush()` should only be called when people need either
> > > > > > > > 1. Flush all deleteBB awaiting deletion.
> > > > > > > > 2. Do queries on both DT and PDT.
> > > > > > > > 
> > > > > > > > In the future, if DTU holds both DT and PDT in JumpThreading, call `DTU->getDomTree` which only flushes DT can pend updates on PDT as lazy as possible.
> > > > > > > I'm concerned the complexity of updates will become unmanageable if you decide to take this route.  Some questions I have:
> > > > > > > * How do you preserve the list of updates for one vs the other? Two separate queues?
> > > > > > > * How do you decide to delete unnecessary updates from one and not the other?
> > > > > > > * How do you manage the list of delted BBs? Are they only really deleted when both DT and PDT are flushed?
> > > > > > > * How do you make sure you don't actually delete a BB early?
> > > > > > > * How do you handle `Eager` vs `Lazy` updates?
> > > > > > > * How do you handle `*Relaxed()` updates?
> > > > > > > 
> > > > > > > The longer you avoid flushing the further your updates become from the real CFG's shape. Debugging an assert where DT and PDT update at different times means the developer has to also carefully keep the state of DTU in mind instead of just the IR and the trees. I'm not sure if the possible saved overhead of updates to one or the other tree is worth all that added complexity.
> > > > > > > How do you preserve the list of updates for one vs the other? Two separate queues?
> > > > > > There is only one queue and two index to record the updates that have been applied and not applied yet.
> > > > > > 
> > > > > > | Index | 0 |  1 |  2 | 3
> > > > > > | PendUpdates | o | o | o | o 
> > > > > > |DTUpdateIndex | | | ↑ | |
> > > > > > | PDTUpdateIndex |↑ | | | |
> > > > > > 
> > > > > > > How do you decide to delete unnecessary updates from one and not the other?
> > > > > > If an update is applied by all available trees, it will be purged.
> > > > > > 
> > > > > > > How do you manage the list of deleted BBs? Are they only really deleted when both DT and PDT are flushed?
> > > > > > Yes, more specifically `DTUpdateIndex == PDTUpdateIndex == PendUpdates.size()`. It's impossible to judge whether a specific DelBB is feasible to be deleted at a specific time point because some callers of delBB just
> > > > > > ```
> > > > > > BasicBlock *BB0 = ...;
> > > > > > DTU.delBB (BB0);
> > > > > > // Then do some actions on BB0
> > > > > > ... ...
> > > > > > DTU.flush(); // The caller expects BB0 be deleted now.
> > > > > > ```
> > > > > > > How do you make sure you don't actually delete a BB early?
> > > > > > Mentioned above.
> > > > > > > How do you handle `Eager` vs `Lazy` updates?
> > > > > > `applyUpdates`: 
> > > > > > `Lazy/Eager+ForceRemoveDuplicates` - Works the same as DDT, more specifically
> > > > > > 1. Deduplicate, which is the main functionality of DDT. 
> > > > > > 2. Inspect whether an edge is presented in CFG and decide whether to discard it. It is a must because the update sequence is unbalanced.
> > > > > > `Eager` - Works the same as raw DT/PDT.
> > > > > > `insertEdge/deleteEdge`:
> > > > > > `Eager`: Works the same as raw DT/PDT and adds inspection whether the update is valid.
> > > > > > `Lazy`: then submits into the list of updates rather than apply it.
> > > > > > `recalculate`:
> > > > > > `Lazy`: `flush()` + raw `recalculate()`
> > > > > > `Eager`: raw `recalculate()`
> > > > > > > How do you handle `*Relaxed()` updates?
> > > > > > It behaves the same as the ones in DDT. It discards invalid updates quietly.
> > > > > > 
> > > > > > > The longer you avoid flushing the further your updates become from the real CFG's shape. Debugging an assert where DT and PDT update at different times means the developer has to also carefully keep the state of DTU in mind instead of just the IR and the trees. I'm not sure if the possible saved overhead of updates to one or the other tree is worth all that added complexity.
> > > > > > 
> > > > > > I don't think so. With current DDT, let me show an example
> > > > > > ```
> > > > > > // Changing the state of CFG begins. 
> > > > > > ... // Modify CFG
> > > > > > MergeBasicBlockIntoOnlyPred (..., &DDT); // Some error happens !
> > > > > > // (You cannot do verification after MergeBasicBlockIntoOnlyPred because the CFG and DT is unsynced)
> > > > > > ... // Modify DomTree
> > > > > > ... // Modify CFG
> > > > > > // Changing the state of CFG ends.
> > > > > > ...
> > > > > > DDT.flush().verify(); // Asserts !
> > > > > > ```
> > > > > > Now with DTU,
> > > > > > ```
> > > > > > // Changing the state of CFG begins.
> > > > > > ... // Modify CFG
> > > > > > MergeBasicBlockIntoOnlyPred (..., &DTU); // Some error happens !
> > > > > > //  (You cannot do verification after MergeBasicBlockIntoOnlyPred because the CFG and DT is unsynced)
> > > > > > ... // Modify DomTree
> > > > > > ... // Modify CFG
> > > > > > // Changing the state of CFG ends.
> > > > > > ...
> > > > > > DTU.getDomTree().verify(); // Asserts !
> > > > > > ```
> > > > > > I can't see the difference. They'll assert at the same location and all far away from where the bug happens.
> > > > > Let's say you have a `DTU` using the `Lazy` strategy. The current set of internally queued pending updates are the following:
> > > > > ```
> > > > > [0] {Insert, A, B}
> > > > > [1] {Delete, B, C}
> > > > > [2] {Insert, B, D}
> > > > > ```
> > > > > 
> > > > > And you have both a DT and PDT. Their Indicies are the following:
> > > > > ```
> > > > > DTU->IndexDT = 0;
> > > > > DTU->IndexPDT = 0;
> > > > > ```
> > > > > 
> > > > > And you flush just the DT:
> > > > > ```
> > > > > DTU->getDomTree(); // flush DT only
> > > > > ```
> > > > > 
> > > > > Now your indicies are the following:
> > > > > ```
> > > > > DTU->IndexDT = 3;
> > > > > DTU0>IndexPDT = 0;
> > > > > ```
> > > > > 
> > > > > And now the following update comes along:
> > > > > ```
> > > > > DTU->deleteEdgeRelaxed(A,B);
> > > > > ```
> > > > > 
> > > > > In the case of PDT you want nothing to happen: this update will remove the inverted update [0] from the queue and result in a no-op to PDT.  But in the case of the DT you need the update to happen: it's a change past the flush.  How do you reconcile this internally with only one queue?
> > > > > In the case of PDT you want nothing to happen: this update will remove the inverted update [0] from the queue and result in a no-op to PDT. But in the case of the DT you need the update to happen: it's a change past the flush. How do you reconcile this internally with only one queue?
> > > > In this case, DTU will only inspect updates after `std::max(IndexDT, IndexPDT)` and try to remove no-ops in that range.
> > > > 
> > > > Q: Why I choose this way, but not flush both trees when calling `getDT/getPDT/flush`?
> > > > A: In `include/llvm/Support/GenericDomTree.h`, comments of `applyUpdates`:
> > > >   /// Batch updates should be generally faster when performing longer sequences
> > > >   /// of updates than calling insertEdge/deleteEdge manually multiple times, as
> > > >   /// it can reorder the updates and remove redundant ones internally.
> > > > It seems that there is performance gain. So I prefer to preserve a longer sequence of updates.
> > > More info added to the above comment:
> > > I think the only thing users of both `DDT` and `DTU` need to be careful is that they mustn't submit any updates that is already in the CFG + already known by the DT/PDT before submitting updates.
> > > An example:
> > > ```
> > > // Lazy DTU/DDT
> > > // Edge A->B already in the CFG + DT knows that
> > > DTU.insertEdge(A, B); // Get queued
> > > ... // delete A->B in the CFG
> > > DTU.deleteEdge(A, B); // Cancel with the above one
> > > ```
> > > This behavior is not supposed to be a functionality at least in `DTU` now. I'm not sure whether there are people using the `DDT` APIs like this.
> > Understood, you are pessimizing early removal of paired updates. I strongly recommend you testing your assumptions about performance of long updates using the `CTMark` option of `test-suite` to reduce the chance of reverts due to performance degradation:
> > ```
> > cd path/to/cloned/test-suite
> > cmake ... \
> >     ... \
> >     -D TEST_SUITE_SUBDIRS=CTMark \
> >     -D TEST_SUITE_RUN_BENCHMARKS=0" \
> >     ... \
> >     $PWD
> > ```
> > Note I recommend this in addition to regular `test-suite` runs designed to detect crashes and `assert()`s.
> > This behavior is not supposed to be a functionality at least in DTU now. I'm not sure whether there are people using the DDT APIs like this.
> I can guarantee you this happens all over JumpThreading. The pass contains no information about prior state or previous failed attempts. I have seen cases where the same edge is inserted and removed 3 or more times across one complete run of the pass.  There is a reason why I was forced to implement `DDT` with the default as `Relaxed`. :)
> 
> 
> 
> 
I understand the case when trying to inspect the CFG to remove the `not really happened` ones in `jump-threading` because it won't be queued and affect following updates.
And I'm thinking whether there is anyone submitting `already happened and DT/PDT also knew` ones.
I'm confused with your reply. In the example I mentioned (DT already got notified before that A->B exists but we still submit one {Insert, A, B})
```
// Lazy DTU/DDT
// Edge A->B already in the CFG + DT knows that
DTU.insertEdge(A, B); // Get queued {Insert, A, B}
... // delete A->B in the CFG
DTU.deleteEdge(A, B); // Cancel with the above one {Delete, A, B} cancels {Insert, A, B}
assert (DTU.getDT().verify()); // Asserts! (DT doesn't know A->B is gone!)
```
It can cause the state of DT not matching the state of the CFG.

================
Comment at: lib/Transforms/Scalar/JumpThreading.cpp:420
   LoopHeaders.clear();
-  DDT->flush();
+  DTU->getDomTree();
   LVI->enableDT();
----------------
NutshellySima wrote:
> brzycki wrote:
> > brzycki wrote:
> > > NutshellySima wrote:
> > > > NutshellySima wrote:
> > > > > brzycki wrote:
> > > > > > NutshellySima wrote:
> > > > > > > brzycki wrote:
> > > > > > > > NutshellySima wrote:
> > > > > > > > > brzycki wrote:
> > > > > > > > > > It's not obvious here the reasoning behind this statement. I recommend either changing this to `DTU->flush()` or adding a comment stating the call will force all updates to the internal DT/PDT trees.
> > > > > > > > > I think `DTU->flush()` should only be called when people need either
> > > > > > > > > 1. Flush all deleteBB awaiting deletion.
> > > > > > > > > 2. Do queries on both DT and PDT.
> > > > > > > > > 
> > > > > > > > > In the future, if DTU holds both DT and PDT in JumpThreading, call `DTU->getDomTree` which only flushes DT can pend updates on PDT as lazy as possible.
> > > > > > > > I'm concerned the complexity of updates will become unmanageable if you decide to take this route.  Some questions I have:
> > > > > > > > * How do you preserve the list of updates for one vs the other? Two separate queues?
> > > > > > > > * How do you decide to delete unnecessary updates from one and not the other?
> > > > > > > > * How do you manage the list of delted BBs? Are they only really deleted when both DT and PDT are flushed?
> > > > > > > > * How do you make sure you don't actually delete a BB early?
> > > > > > > > * How do you handle `Eager` vs `Lazy` updates?
> > > > > > > > * How do you handle `*Relaxed()` updates?
> > > > > > > > 
> > > > > > > > The longer you avoid flushing the further your updates become from the real CFG's shape. Debugging an assert where DT and PDT update at different times means the developer has to also carefully keep the state of DTU in mind instead of just the IR and the trees. I'm not sure if the possible saved overhead of updates to one or the other tree is worth all that added complexity.
> > > > > > > > How do you preserve the list of updates for one vs the other? Two separate queues?
> > > > > > > There is only one queue and two index to record the updates that have been applied and not applied yet.
> > > > > > > 
> > > > > > > | Index | 0 |  1 |  2 | 3
> > > > > > > | PendUpdates | o | o | o | o 
> > > > > > > |DTUpdateIndex | | | ↑ | |
> > > > > > > | PDTUpdateIndex |↑ | | | |
> > > > > > > 
> > > > > > > > How do you decide to delete unnecessary updates from one and not the other?
> > > > > > > If an update is applied by all available trees, it will be purged.
> > > > > > > 
> > > > > > > > How do you manage the list of deleted BBs? Are they only really deleted when both DT and PDT are flushed?
> > > > > > > Yes, more specifically `DTUpdateIndex == PDTUpdateIndex == PendUpdates.size()`. It's impossible to judge whether a specific DelBB is feasible to be deleted at a specific time point because some callers of delBB just
> > > > > > > ```
> > > > > > > BasicBlock *BB0 = ...;
> > > > > > > DTU.delBB (BB0);
> > > > > > > // Then do some actions on BB0
> > > > > > > ... ...
> > > > > > > DTU.flush(); // The caller expects BB0 be deleted now.
> > > > > > > ```
> > > > > > > > How do you make sure you don't actually delete a BB early?
> > > > > > > Mentioned above.
> > > > > > > > How do you handle `Eager` vs `Lazy` updates?
> > > > > > > `applyUpdates`: 
> > > > > > > `Lazy/Eager+ForceRemoveDuplicates` - Works the same as DDT, more specifically
> > > > > > > 1. Deduplicate, which is the main functionality of DDT. 
> > > > > > > 2. Inspect whether an edge is presented in CFG and decide whether to discard it. It is a must because the update sequence is unbalanced.
> > > > > > > `Eager` - Works the same as raw DT/PDT.
> > > > > > > `insertEdge/deleteEdge`:
> > > > > > > `Eager`: Works the same as raw DT/PDT and adds inspection whether the update is valid.
> > > > > > > `Lazy`: then submits into the list of updates rather than apply it.
> > > > > > > `recalculate`:
> > > > > > > `Lazy`: `flush()` + raw `recalculate()`
> > > > > > > `Eager`: raw `recalculate()`
> > > > > > > > How do you handle `*Relaxed()` updates?
> > > > > > > It behaves the same as the ones in DDT. It discards invalid updates quietly.
> > > > > > > 
> > > > > > > > The longer you avoid flushing the further your updates become from the real CFG's shape. Debugging an assert where DT and PDT update at different times means the developer has to also carefully keep the state of DTU in mind instead of just the IR and the trees. I'm not sure if the possible saved overhead of updates to one or the other tree is worth all that added complexity.
> > > > > > > 
> > > > > > > I don't think so. With current DDT, let me show an example
> > > > > > > ```
> > > > > > > // Changing the state of CFG begins. 
> > > > > > > ... // Modify CFG
> > > > > > > MergeBasicBlockIntoOnlyPred (..., &DDT); // Some error happens !
> > > > > > > // (You cannot do verification after MergeBasicBlockIntoOnlyPred because the CFG and DT is unsynced)
> > > > > > > ... // Modify DomTree
> > > > > > > ... // Modify CFG
> > > > > > > // Changing the state of CFG ends.
> > > > > > > ...
> > > > > > > DDT.flush().verify(); // Asserts !
> > > > > > > ```
> > > > > > > Now with DTU,
> > > > > > > ```
> > > > > > > // Changing the state of CFG begins.
> > > > > > > ... // Modify CFG
> > > > > > > MergeBasicBlockIntoOnlyPred (..., &DTU); // Some error happens !
> > > > > > > //  (You cannot do verification after MergeBasicBlockIntoOnlyPred because the CFG and DT is unsynced)
> > > > > > > ... // Modify DomTree
> > > > > > > ... // Modify CFG
> > > > > > > // Changing the state of CFG ends.
> > > > > > > ...
> > > > > > > DTU.getDomTree().verify(); // Asserts !
> > > > > > > ```
> > > > > > > I can't see the difference. They'll assert at the same location and all far away from where the bug happens.
> > > > > > Let's say you have a `DTU` using the `Lazy` strategy. The current set of internally queued pending updates are the following:
> > > > > > ```
> > > > > > [0] {Insert, A, B}
> > > > > > [1] {Delete, B, C}
> > > > > > [2] {Insert, B, D}
> > > > > > ```
> > > > > > 
> > > > > > And you have both a DT and PDT. Their Indicies are the following:
> > > > > > ```
> > > > > > DTU->IndexDT = 0;
> > > > > > DTU->IndexPDT = 0;
> > > > > > ```
> > > > > > 
> > > > > > And you flush just the DT:
> > > > > > ```
> > > > > > DTU->getDomTree(); // flush DT only
> > > > > > ```
> > > > > > 
> > > > > > Now your indicies are the following:
> > > > > > ```
> > > > > > DTU->IndexDT = 3;
> > > > > > DTU0>IndexPDT = 0;
> > > > > > ```
> > > > > > 
> > > > > > And now the following update comes along:
> > > > > > ```
> > > > > > DTU->deleteEdgeRelaxed(A,B);
> > > > > > ```
> > > > > > 
> > > > > > In the case of PDT you want nothing to happen: this update will remove the inverted update [0] from the queue and result in a no-op to PDT.  But in the case of the DT you need the update to happen: it's a change past the flush.  How do you reconcile this internally with only one queue?
> > > > > > In the case of PDT you want nothing to happen: this update will remove the inverted update [0] from the queue and result in a no-op to PDT. But in the case of the DT you need the update to happen: it's a change past the flush. How do you reconcile this internally with only one queue?
> > > > > In this case, DTU will only inspect updates after `std::max(IndexDT, IndexPDT)` and try to remove no-ops in that range.
> > > > > 
> > > > > Q: Why I choose this way, but not flush both trees when calling `getDT/getPDT/flush`?
> > > > > A: In `include/llvm/Support/GenericDomTree.h`, comments of `applyUpdates`:
> > > > >   /// Batch updates should be generally faster when performing longer sequences
> > > > >   /// of updates than calling insertEdge/deleteEdge manually multiple times, as
> > > > >   /// it can reorder the updates and remove redundant ones internally.
> > > > > It seems that there is performance gain. So I prefer to preserve a longer sequence of updates.
> > > > More info added to the above comment:
> > > > I think the only thing users of both `DDT` and `DTU` need to be careful is that they mustn't submit any updates that is already in the CFG + already known by the DT/PDT before submitting updates.
> > > > An example:
> > > > ```
> > > > // Lazy DTU/DDT
> > > > // Edge A->B already in the CFG + DT knows that
> > > > DTU.insertEdge(A, B); // Get queued
> > > > ... // delete A->B in the CFG
> > > > DTU.deleteEdge(A, B); // Cancel with the above one
> > > > ```
> > > > This behavior is not supposed to be a functionality at least in `DTU` now. I'm not sure whether there are people using the `DDT` APIs like this.
> > > Understood, you are pessimizing early removal of paired updates. I strongly recommend you testing your assumptions about performance of long updates using the `CTMark` option of `test-suite` to reduce the chance of reverts due to performance degradation:
> > > ```
> > > cd path/to/cloned/test-suite
> > > cmake ... \
> > >     ... \
> > >     -D TEST_SUITE_SUBDIRS=CTMark \
> > >     -D TEST_SUITE_RUN_BENCHMARKS=0" \
> > >     ... \
> > >     $PWD
> > > ```
> > > Note I recommend this in addition to regular `test-suite` runs designed to detect crashes and `assert()`s.
> > > This behavior is not supposed to be a functionality at least in DTU now. I'm not sure whether there are people using the DDT APIs like this.
> > I can guarantee you this happens all over JumpThreading. The pass contains no information about prior state or previous failed attempts. I have seen cases where the same edge is inserted and removed 3 or more times across one complete run of the pass.  There is a reason why I was forced to implement `DDT` with the default as `Relaxed`. :)
> > 
> > 
> > 
> > 
> I understand the case when trying to inspect the CFG to remove the `not really happened` ones in `jump-threading` because it won't be queued and affect following updates.
> And I'm thinking whether there is anyone submitting `already happened and DT/PDT also knew` ones.
> I'm confused with your reply. In the example I mentioned (DT already got notified before that A->B exists but we still submit one {Insert, A, B})
> ```
> // Lazy DTU/DDT
> // Edge A->B already in the CFG + DT knows that
> DTU.insertEdge(A, B); // Get queued {Insert, A, B}
> ... // delete A->B in the CFG
> DTU.deleteEdge(A, B); // Cancel with the above one {Delete, A, B} cancels {Insert, A, B}
> assert (DTU.getDT().verify()); // Asserts! (DT doesn't know A->B is gone!)
> ```
> It can cause the state of DT not matching the state of the CFG.
Thanks for your advice.
But I doubt whether this test can show the performance degradation in the *current* LLVM codebase because
first, this behavior only affects 
1. a DTU under Lazy UpdateStrategy with both DT and PDT 
2. trying to perform a long sequence of updates
second, in the current LLVM codebase, only `adce` incrementally updates `PDT` with an extremely small update vector.

https://reviews.llvm.org/D48967