[llvm-dev] DominatorTree, JumpThreading and EarlyCSE non-determinism
Nikita Popov via llvm-dev
llvm-dev at lists.llvm.org
Mon Sep 27 09:37:39 PDT 2021
On Mon, Sep 27, 2021 at 5:47 PM Jakub (Kuba) Kuderski via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi Björn,
>
> Thanks for the update, this is an excellent summary. On the highest level,
> I don't agree that the dominator tree order or update order is
> deterministic, and I think we would have to agree to change this assumption
> first if we want to move forward in the direction that you suggested in
> https://reviews.llvm.org/D110292. The domtree updater's documentation
> says that all submitted updates are treated as an unordered set, to reserve
> the right to reorder them internally:
> https://github.com/llvm/llvm-project/blob/74d622dea450da4b85383aa4b1758b902ef906a6/llvm/include/llvm/Support/GenericDomTree.h#L530.
> The construction/updater code also uses sets internally, which wasn't
> tested for determinism across different platforms from what I know.
>
I don't think there's a contradiction here: Just because no particular
order is guaranteed, doesn't mean that the order can be non-deterministic.
A lot of places in LLVM work with orders that are arbitrary and entirely
dependent on implementation details, but are still deterministic (e.g.
pretty much every worklist algorithm).
>From what I can see, the DT update code really goes out of the way to
produce a deterministic result (see
https://github.com/llvm/llvm-project/blob/74d622dea450da4b85383aa4b1758b902ef906a6/llvm/include/llvm/Support/CFGUpdate.h#L95-L111
).
At least as far as the status quo is concerned, this seems like a pretty
clear cut case: The problematic code used to be explicitly deterministic
(using SmallSetVector, aka deterministic SmallSet), and then someone
optimized it based on a premise that turned out to be incorrect.
Regards,
Nikita
Without changing the spec and implementation of the updater, trying to
> enforce deterministic update vector construction order seems to me more
> like hiding the underlying issue than addressing it (for example, by not
> relying on the domtreee order in the first place). Similarly, I think that
> changing the order in the printing code but not in the data structure could
> cause further confusion.
>
> Sincerely,
> Jakub
>
> On Mon, Sep 27, 2021 at 11:29 AM Björn Pettersson A <
> bjorn.a.pettersson at ericsson.com> wrote:
>
>> I created a patch that would make the DominatorTree updates (via
>> DomTreeUpdater) happen in a deterministic order (again):
>>
>> https://reviews.llvm.org/D110292
>>
>>
>>
>> I write again, because afaict those updates used to be done in a
>> deterministic order, but it was consciously broken during a series of NFCI
>> patches earlier this year including commits such as
>> https://reviews.llvm.org/rGe5692a564a73ef63b7baaf80c2b7a62ad74e9e66 and
>> https://reviews.llvm.org/rG1c55dcbca71d2df2fee4564ad53b62505fdbb819.
>>
>>
>>
>>
>>
>> It has been questioned in D110292 if that patch really is the wanted
>> solution. The alternative would be that transforms that depend on the node
>> order would need to do some kind of sorting before iterating over nodes in
>> the tree (to get a deterministic order for the traversal).
>>
>>
>>
>> Here are some of my quick reasoning about the alternatives:
>>
>> One assumed upside with D110292 is that if passes like EarlyCSE need to
>> make an ordering for the nodes to be iterated themselves, then that would
>> be an extra cost (at least potentially) compared to just doing things in a
>> deterministic order from the start.
>> With that being said, the accumulative cost of doing things like in this
>> patch up-front might end up being higher than doing a single sorting later.
>> And it will also just add a cost even if there are no passes later in the
>> pipeline that depends on the node order. So lots of uncertainty when
>> considering which is best for compile-time.
>>
>> One downside with D110292 is that if the analysis is invalidated (and
>> recalculated from scratch) then we probably get a different order of the
>> nodes. Still making things a bit shaky when debugging/bisecting/reducing
>> test cases.
>> With that being said, when debugging and printing the DomTree between
>> passes etc it could be nice if the printout is deterministic. This could of
>> course be solved by doing some kind of sorting also before printing the
>> tree, just like we would have to do in EarlyCSE. The performance of the
>> print-function itself should not be that important so that is a smaller
>> problem of course.
>>
>>
>>
>> So how do we decide what to do here and how to proceed? (The compiler is
>> currently broken, and I’ve prepared a patch based on one of the possible
>> solutions…)
>>
>>
>>
>> Regards,
>>
>> Björn
>>
>>
>>
>> *From:* llvm-dev <llvm-dev-bounces at lists.llvm.org> *On Behalf Of *Björn
>> Pettersson A via llvm-dev
>> *Sent:* den 21 september 2021 22:30
>> *To:* Jakub (Kuba) Kuderski <kubakuderski at gmail.com>
>> *Cc:* llvm-dev <llvm-dev at lists.llvm.org>; Johan Ringström <
>> johan.ringstrom at ericsson.com>
>> *Subject:* Re: [llvm-dev] DominatorTree, JumpThreading and EarlyCSE
>> non-determinism
>>
>>
>>
>> >> Although, that still doesn’t solve the potential problem (fault?) that
>> the nodes aren’t inserted in those vectors with child nodes in the first
>> place.
>>
>> >> And neither does it help EarlyCSE to find the optimal order of
>> iterating through child nodes (to eliminate all possible PHI-nodes in my
>> example). I.e. the outcome from EarlyCSE would still depend on the sorting
>> order for the DomTree with such a solution. But maybe that just is how the
>> algorithm in EarlyCSE is supposed to work.
>>
>>
>>
>> > Can you explain these two? I'm not very familiar with either JT or
>> EarlyCSE and don't follow.
>>
>> With the first one I was thinking that there wouldn’t be any need to sort
>> child nodes if the DomTree updates made by JumpThreading were made in a
>> deterministic order. I.e. sorting post updating the dominator tree would
>> only hide the actual problem, at least if the idea is that the vector with
>> child nodes is sorted by the insertion order.
>>
>> After some more debugging I’ve found that the culprit here seem to be
>> that llvm::MergeBasicBlockIntoOnlyPred (in Local.cpp) is using a
>> SmallPtrSet for the PredsOfPredBB temporary list. And then the Updates sent
>> to DomTreeUpdater ends up in a non-deterministic order based memory
>> allocations. If changing from SmallPtrSet to SmallVector, then the updates
>> will be deterministic.
>>
>> (I’ll prepare a patch for that!)
>>
>>
>>
>> My point/question related to EarlyCSE is that the transformation done by
>> that pass seem to depend on the internal order of nodes inside the
>> DominatorTree. And I wonder if that is a known fact, or if it should be
>> considered as a fault. So at the moment, the order in which DominatorTree
>> is updated by a previous pass could impact if EarlyCSE is doing a certain
>> transform or not. And if for example modifying the pipeline slightly to
>> invalidate the DominatorTree analysis before EarlyCSE, then I might get
>> different IR in the output from EarlyCSE. Feels a bit strange (and unlucky)
>> IMO that not only the input IR to EarlyCSE will impact the result, but also
>> how the DominatorTree is calculated.
>>
>> So for EarlyCSE, it could be possible to make it less sensitive to the
>> DominatorTree internal structure/sorting by adding some kind of sorting by
>> basic block order when processing the dom tree nodes.
>>
>> But making EarlyCSE insensitive to how nodes are ordered in the
>> DominatorTree would only make sure that the EarlyCSE transformation are
>> applied in a given order. But apparently we get different results depending
>> on in which order we process the nodes. And that is what I meant by talking
>> about “optimal order”. I don’t know if EarlyCSE should run twice or
>> something (such as processing nodes in forward+reverse order) to make sure
>> we do not miss out on transformation that would have been applied if using
>> a different node processing order. Is that making sense?
>>
>>
>>
>> /Björn
>>
>>
>>
>> *From:* Jakub (Kuba) Kuderski <kubakuderski at gmail.com>
>> *Sent:* den 21 september 2021 17:14
>> *To:* Björn Pettersson A <bjorn.a.pettersson at ericsson.com>
>> *Cc:* Roman Lebedev <lebedev.ri at gmail.com>; llvm-dev <
>> llvm-dev at lists.llvm.org>; Johan Ringström <johan.ringstrom at ericsson.com>
>> *Subject:* Re: [llvm-dev] DominatorTree, JumpThreading and EarlyCSE
>> non-determinism
>>
>>
>>
>> Although, that still doesn’t solve the potential problem (fault?) that
>> the nodes aren’t inserted in those vectors with child nodes in the first
>> place.
>>
>>
>>
>> And neither does it help EarlyCSE to find the optimal order of iterating
>> through child nodes (to eliminate all possible PHI-nodes in my example).
>> I.e. the outcome from EarlyCSE would still depend on the sorting order for
>> the DomTree with such a solution. But maybe that just is how the algorithm
>> in EarlyCSE is supposed to work.
>>
>>
>>
>> Can you explain these two? I'm not very familiar with either JT or
>> EarlyCSE and don't follow.
>>
>> I was thinking about something like this:
>> - Iterate over the whole function to create a mapping BB -> idx
>> - If a pass depends on the order of children, add a helper function like
>> reorder(tree_node_range, bb_to_idx_map) and wrap all calls to children with
>> that
>>
>> - If a pass depends on DFS numbers, use the BB -> idx map if possible,
>> or have a helper function that fixes up dfs numbers based on the map
>> - If it's inconvenient to implement the same helpers in multiple places
>> in the codebase, add a function to DT that will actually change the
>> children order based on a given comparison function
>>
>>
>>
>> On Tue, Sep 21, 2021 at 11:03 AM Björn Pettersson A <
>> bjorn.a.pettersson at ericsson.com> wrote:
>>
>> Hi Jakub,
>>
>>
>>
>> I imagine sorting the children would solve the problem with
>> non-determinism for the test case (just like it helps to
>> invalidate/recalculate the DomTree before EarlyCSE). I haven’t tested it
>> though (I’m not quite sure how to get the basic block order number quickly
>> for a BasicBlock).
>>
>>
>>
>> Although, that still doesn’t solve the potential problem (fault?) that
>> the nodes aren’t inserted in those vectors with child nodes in the first
>> place.
>>
>>
>>
>> And neither does it help EarlyCSE to find the optimal order of iterating
>> through child nodes (to eliminate all possible PHI-nodes in my example).
>> I.e. the outcome from EarlyCSE would still depend on the sorting order for
>> the DomTree with such a solution. But maybe that just is how the algorithm
>> in EarlyCSE is supposed to work.
>>
>>
>>
>> /Björn
>>
>>
>>
>> *From:* Jakub (Kuba) Kuderski <kubakuderski at gmail.com>
>> *Sent:* den 21 september 2021 16:27
>> *To:* Björn Pettersson A <bjorn.a.pettersson at ericsson.com>
>> *Cc:* Roman Lebedev <lebedev.ri at gmail.com>; llvm-dev <
>> llvm-dev at lists.llvm.org>; Johan Ringström <johan.ringstrom at ericsson.com>
>> *Subject:* Re: [llvm-dev] DominatorTree, JumpThreading and EarlyCSE
>> non-determinism
>>
>>
>>
>> Hi Björn,
>>
>> From the perspective of DomTree, node children are generally unordered. I
>> think a pass can be sensitive to DT order by either directly relying on
>> tree children order or on DFS numbers.
>> If that's the case, one workaround would be to re-order children based on
>> the block order in the parent. If that's a common enough requirement, we
>> could add it to DT.
>>
>> Would that solve the problem?
>> -Jakub
>>
>>
>>
>> On Tue, Sep 21, 2021 at 9:33 AM Björn Pettersson A via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>> Yes, the IR looks the same out from jump-threading for this specific
>> input (at least when comparing output from -print-after-all).
>>
>> It is the order of the DominatorTree updates that I suspect differ a bit
>> (at least according to the -debug printouts), giving slightly different
>> node orders in the DominatorTree. And I have no idea if JumpThreading can
>> be blamed for that or if it is the Lazy strategy in the DomTreeUpdater.
>>
>> I guess I'll file a PR for this. But I'm still not quite sure what the
>> rules are here (if both passes are wrong, or which one to blame), and what
>> to expect more generally in situation like this one.
>>
>> /Björn
>>
>> > -----Original Message-----
>> > From: Roman Lebedev <lebedev.ri at gmail.com>
>> > Sent: den 21 september 2021 15:15
>> > To: Björn Pettersson A <bjorn.a.pettersson at ericsson.com>
>> > Cc: llvm-dev <llvm-dev at lists.llvm.org>; Johan Ringström
>> > <johan.ringstrom at ericsson.com>
>> > Subject: Re: [llvm-dev] DominatorTree, JumpThreading and EarlyCSE non-
>> > determinism
>> >
>> > Does JumpThreading produce exactly the same IR in all of the situations?
>> > But even if it does, this does seem like a bug.
>> >
>> > Roman
>> >
>> > On Tue, Sep 21, 2021 at 4:10 PM Björn Pettersson A via llvm-dev
>> > <llvm-dev at lists.llvm.org> wrote:
>> > >
>> > > Hello llvm-dev,
>> > >
>> > > Recently I've been debugging a non-determinism problem.
>> > >
>> > > I've managed to narrow it down to running:
>> > >
>> > > opt -passes='function(jump-threading,print<domtree>,early-cse)'
>> > >
>> > >
>> > > From debug printouts (also adding -debug to the cmd line) it looks
>> like
>> > jump-threading is doing doing dominator tree updates in a
>> non-deterministic
>> > order (when comparing different runs).
>> > >
>> > > I randomly get either of these two DominatorTrees in the
>> print<domtree>
>> > printouts:
>> > >
>> > > DominatorTree for function: g
>> > > =============================--------------------------------
>> > > Inorder Dominator Tree: DFSNumbers invalid: 0 slow queries.
>> > > [1] %entry {4294967295,4294967295} [0]
>> > > [2] %for.cond1 {4294967295,4294967295} [1]
>> > > [3] %if.then {4294967295,4294967295} [2]
>> > > [4] %for.cond5.preheader {4294967295,4294967295} [3]
>> > > [5] %cleanup {4294967295,4294967295} [4]
>> > > [6] %cleanup16 {4294967295,4294967295} [5]
>> > > [7] %unreachable {4294967295,4294967295} [6]
>> > > [7] %for.end21 {4294967295,4294967295} [6]
>> > > [5] %for.body7 {4294967295,4294967295} [4]
>> > > [6] %for.inc {4294967295,4294967295} [5]
>> > > [5] %return {4294967295,4294967295} [4]
>> > > [3] %cleanup16.thread {4294967295,4294967295} [2]
>> > > [3] %for.inc19 {4294967295,4294967295} [2]
>> > > [2] %for.cond {4294967295,4294967295} [1]
>> > > Roots: %entry
>> > >
>> > > DominatorTree for function: g
>> > > =============================--------------------------------
>> > > Inorder Dominator Tree: DFSNumbers invalid: 0 slow queries.
>> > > [1] %entry {4294967295,4294967295} [0]
>> > > [2] %for.cond1 {4294967295,4294967295} [1]
>> > > [3] %for.inc19 {4294967295,4294967295} [2]
>> > > [3] %if.then {4294967295,4294967295} [2]
>> > > [4] %for.cond5.preheader {4294967295,4294967295} [3]
>> > > [5] %cleanup {4294967295,4294967295} [4]
>> > > [6] %cleanup16 {4294967295,4294967295} [5]
>> > > [7] %unreachable {4294967295,4294967295} [6]
>> > > [7] %for.end21 {4294967295,4294967295} [6]
>> > > [5] %for.body7 {4294967295,4294967295} [4]
>> > > [6] %for.inc {4294967295,4294967295} [5]
>> > > [5] %return {4294967295,4294967295} [4]
>> > > [3] %cleanup16.thread {4294967295,4294967295} [2]
>> > > [2] %for.cond {4294967295,4294967295} [1]
>> > > Roots: %entry
>> > >
>> > >
>> > > I think both trees are correct, the nodes are just in a different
>> order.
>> > This can also be seen if I add invalidate<domtree> before
>> print<domtree> to
>> > force a recalculation. Then it will look like this instead (same
>> content,
>> > but the level 3 nodes printed in yet another order):
>> > >
>> > > DominatorTree for function: g
>> > > =============================--------------------------------
>> > > Inorder Dominator Tree: DFSNumbers invalid: 0 slow queries.
>> > > [1] %entry {4294967295,4294967295} [0]
>> > > [2] %for.cond1 {4294967295,4294967295} [1]
>> > > [3] %cleanup16.thread {4294967295,4294967295} [2]
>> > > [3] %for.inc19 {4294967295,4294967295} [2]
>> > > [3] %if.then {4294967295,4294967295} [2]
>> > > [4] %for.cond5.preheader {4294967295,4294967295} [3]
>> > > [5] %cleanup {4294967295,4294967295} [4]
>> > > [6] %cleanup16 {4294967295,4294967295} [5]
>> > > [7] %unreachable {4294967295,4294967295} [6]
>> > > [7] %for.end21 {4294967295,4294967295} [6]
>> > > [5] %return {4294967295,4294967295} [4]
>> > > [5] %for.body7 {4294967295,4294967295} [4]
>> > > [6] %for.inc {4294967295,4294967295} [5]
>> > > [2] %for.cond {4294967295,4294967295} [1]
>> > > Roots: %entry
>> > >
>> > >
>> > > *** Question one ***
>> > > Maybe it is OK from a correctness point-of-view that JumpThreading
>> isn't
>> > producing the same node order every time (given same input), even
>> though it
>> > might be a bit confusing when debugging. Or should this be seen as a
>> bug?
>> > >
>> > >
>> > > Next problem/question is related to EarlyCSE. It looks like the output
>> > from EarlyCSE depends on the node order in the dominator tree. So
>> depending
>> > on if JumpThreading has produced the first or second of the
>> DominatorTree
>> > structures above we might end up eliminating one more/less PHI node
>> during
>> > EarlyCSE.
>> > >
>> > > *** Question two ***
>> > > Should be seen as a bug? Or are passes in general sensitive to how
>> > analysis information is produced (such as the node order in a dominator
>> > tree) and thus being allowed to produce better/worse code depending on
>> such
>> > things?
>> > >
>> > >
>> > > To summarize:
>> > > JumpThreading is non-deterministic but produces semantically
>> "equivalent"
>> > results.
>> > > EarlyCSE is deterministic, but the result depends on order of nodes in
>> > the DominatorTree.
>> > > Those two things together gives a non-deterministic IR result. And I
>> > figure that should be seen as a bug. Just not sure if the fault is in
>> > DominatorTreeUpdater, JumpThreading or EarlyCSE (or if it should be
>> seen as
>> > multiple faults).
>> > >
>> > > Regards,
>> > > Björn
>> > > _______________________________________________
>> > > LLVM Developers mailing list
>> > > llvm-dev at lists.llvm.org
>> > > https://protect2.fireeye.com/v1/url?k=dfb69fe9-802da6ec-dfb6df72-
>> > 86959e472243-b94d5bb9ef523712&q=1&e=3ff5c3d9-dda7-472c-aef6-
>> > a705903687cb&u=https%3A%2F%2Flists.llvm.org
>> <https://protect2.fireeye.com/v1/url?k=3721e336-68bad973-3721a3ad-867b36d1634c-355d724705f0f488&q=1&e=de75eae6-7545-4fa3-a900-922cb7fd03cc&u=http%3A%2F%2F2flists.llvm.org%2F>
>> %2Fcgi-
>> > bin%2Fmailman%2Flistinfo%2Fllvm-dev
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> <https://protect2.fireeye.com/v1/url?k=d909ae33-86929476-d909eea8-867b36d1634c-f7988c4d918d2be0&q=1&e=de75eae6-7545-4fa3-a900-922cb7fd03cc&u=https%3A%2F%2Flists.llvm.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fllvm-dev>
>>
>>
>>
>>
>> --
>>
>> Jakub Kuderski
>>
>>
>>
>>
>> --
>>
>> Jakub Kuderski
>>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210927/bb623bd6/attachment-0001.html>
More information about the llvm-dev
mailing list