[LLVMdev] PBQP crash

Lang Hames lhames at gmail.com
Fri Jan 30 14:34:47 PST 2015


- Re-sending to include llvm-dev.

HI Jonas,

This is great - thank you very much for your analysis!

You're spot on about Bug 1 - the row/column checks are transposed there. I
have fixed this in r227628.

Regarding Bug 2, as discussed on the other thread I'm going to teach the
register allocator to prune single-option vregs so that they never make it
into the graph.

I haven't had a chance to dig in to Bug 3 yet, but I hope to soon.

Cheers,
Lang.


On Thu, Jan 29, 2015 at 9:19 AM, Jonas Paulsson <jonas.paulsson at ericsson.com
> wrote:

>  Hi,
>
>
>
> Sorry for the delay, it has taken some extra time as more than one bug
> showed up J
>
>
>
> I continued to look into this with your viewpoint that a node that is
> conservatively allocatable should never be spilled. The first thing I did
> was therefore to add some extra code with an assert for this.
>
>
>
> I believe I then found three bugs and fixed the two:
>
>
>
> Bug 1: Incorrect transpositions in handleAddEdge() / hanldeRemoveEdge().
> For the heuristic of DeniedOpts, if N runs along the columns (Transpose ==
> true), the interesting dimension is along the rows, and vice versa. In
> other words, the question is what the neighbour could maximally deny N, and
> the answer is found in WorstRow, not WorstCol (as it was). This makes a
> difference if sub/super registers overlap.
>
>
>
> Bug 2: When it happens that there are no available physical registers for
> a node (e.g because of regmasks or calls), the number of rows / columns is
> just 1, for the spill option. This case must be detected in
> MatrixMetadata(), or WorstColCountForCurRow will get an undefined value, as
> std::max_element() is called with an empty vector.
>
>
>
> Bug 3: Again a conservatively allocatable node was spilled, and the assert
> triggered. This is a description of what happened in my out-of-tree test
> case:
>
>
>
> applyR2() called on node N, which is overlapping Y and Z. The edges (N,Y)
> and (N,Z) are all-zeroes. Y and Z are overlapping and already have an
> interference edge. Z is just on the limit of not being conservatively
> allocatable: NumOpts is 8 and DeniedOpts is also 8. It is contained in
> NotProvablyAllocatableNodes. G.setEdgeCosts() is called and then the call
> stack grows with Solver->handleSetEdgeCosts(), handleRemoveEdge() into
> handleDisconnectEdge(), where NMd.handleRemoveEdge() is called, which
> decreases the DeniedOpts by one. After this, it looks like a bug to me that
> Z is moved to ConservativelyAllocatableNodes, because eventually
> handleSetEdgeCosts() will complete, and the edge between Y and Z will have
> been added again in handleAddEdge(), and Z:DeniedOpts is again 8!
>
>
>
> I think this also shows up in a test case for arm. It was found by using
> the assert mentioned above, and running 'bin/llvm-stress -size 200 -seed
> 17761'. The attached file (pbqp_reduced.ll) is the failing test case found
> reduced with bugpoint. Apply patch 2 (the assert), and then run 'llc
> pbqp_reduced.ll -mtriple=aarch64-none-linux-gnu -mcpu=cortex-a57
> -mattr=+neon -optimize-regalloc -regalloc=pbqp'. The assert shows a node
> that is spilled that was conservatively allocatable. The test case itself
> will not fail without this assert. Add the Verbose-PBQP-dump patch and use
> '-debug-only=regalloc -pbqp-dump-graphs' to see the more verbose output
> which shows the progress of the algorithm leading to the assert:
>
>
>
> * Applied R2(NId 18)handleDisconnectEdge(9, 2) : DeniedOpts 10 -> 9
>
> NId 9(%vreg15, GPR64common)  moved to conservatively-allocatables.
>
> handleDisconnectEdge(2, 9) : DeniedOpts 10 -> 9
>
> NId 2(%vreg4, GPR64common)  moved to conservatively-allocatables.
>
> ...
>
> Popped NId 2(%vreg4, GPR64common) , all edge costs added:
>
> 2.002748e+01 inf inf inf inf inf inf inf inf inf inf ** selection: 0
>
> llc: ../include/llvm/CodeGen/PBQP/ReductionRules.h:214:
> llvm::PBQP::Solution llvm::PBQP::backpropagate(GraphT&, StackT,
> llvm::PBQP::NodeSet&) [with GraphT = llvm::PBQP::Gra
>
> ph<llvm::PBQP::RegAlloc::RegAllocSolverImpl>; StackT =
> std::vector<unsigned int>; llvm::PBQP::NodeSet = std::set<unsigned int>]:
> Assertion `PushedAsConservativelyAllocatabl
>
> eNodes.find(NId) == PushedAsConservativelyAllocatableNodes.end() && "Node
> in conservatively allocatable set spilled!"' failed.
>
>
>
> I am not that familiar with the arm architecture, but it looks like NId 2
> is incorrectly moved from not-provens to conservatively-allocables, after
> R2(NId 18). This looks to me like the same bug, but I could be wrong. Note
> that it **does not show up if the bugfix for the transpositions is
> applied** (for random reasons I beleive).
>
>
>
> I am not sure how to fix it myself, as it seems non-trivial relating to
> temporary edge removal. Does anyone have any idea?
>
>
>
> In addition to the reduced test case, I attach four patches, where 1 and 4
> are not ready to commit, but  I would be happy to modify and commit them if
> wanted:
>
>
>
> 1. 0001-Assert…  The assert (with some extra code related to it) that
> checks that a node that was conservatively allocatable never spills. My
> quick way to achieve this was to use an extra set of nodes as a means of
> remembering their origin. This is a very good idea, considering that this
> should never happen, and that there is some worry that the code might be
> incorrect.
>
>
>
> 2. 0001-Bugfix-in-PBQP-matrix-transpositions… A bugfix for the
> transposition bug. (Bug 1). Small patch and also sent to llvm-commits for
> review.
>
>
>
> 3. 0001-Bugfix-in-PBQP-regarding-Matrixes-and… A bugfix for the 1-column
> case (Bug 2). Small patch and also sent to llvm-commits for review.
>
>
>
> 4. 0001-Verbose… Increased debug-dump and pbqp-graph output. Quite crude
> code, because I had a conflict in the ARM backend when I tried to #define
> DEBUG_TYPE "regalloc". I think this could help PBQP get a wider audience,
> perhaps, even if it is still far from perfect. Perhaps this could be
> cleaned up and a target-independent name for all pbqp-debug-dumps could be
> found and then commited.
>
>
>
> Looking forward to your reply,
>
>
>
> Jonas Paulsson
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *From:* Arnaud A. de Grandmaison [mailto:arnaud.degrandmaison at arm.com]
> *Sent:* den 27 januari 2015 08:43
> *To:* 'Lang Hames'; Jonas Paulsson
> *Cc:* llvmdev at cs.uiuc.edu
> *Subject:* RE: [LLVMdev] PBQP crash
>
>
>
> > A node should never be put into the conservatively allocatable list if
> there is a chance of it spilling.
>
>
>
> I can understand why the logic of
> NodeMetadata::isConservativelyAllocatable is necessary for the node to be
> allocatable, but I have not been able to convince myself this is
> sufficient, especially when the node degree > available registers.
>
>
>
> Cheers,
>
> Arnaud
>
>
>
> *From:* llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu
> <llvmdev-bounces at cs.uiuc.edu>] *On Behalf Of *Lang Hames
> *Sent:* 27 January 2015 06:06
> *To:* Jonas Paulsson
> *Cc:* llvmdev at cs.uiuc.edu
> *Subject:* Re: [LLVMdev] PBQP crash
>
>
>
> Hi Jonas,
>
>
>
> > * The problematic node that was spilled again, was in the
> ConservativelyAllocatableNodes set during reduce(). The comment in reduce()
> “Conservatively allocatable nodes will never spill…” indicates that perhaps
> this is an incorrect insertion, as the regs did in fact run out in this
> case.
>
>
>
> Arnaud is correct: A node should never be put into the conservatively
> allocatable list if there is a chance of it spilling. Off the top of my
> head I can imagine 2 things going wrong here:
>
>
>
> (1) Conservative allocability is mostly-precomputed, and updated with
> deltas as nodes are removed. It is possible that there is some subtle bug
> in this code.
>
>
>
> (2) I think the current conservative allocability test bakes in the
> assumption that all register options have non-infinite cost. If you assign
> infinite costs to any physical register I would expect this to blow up.
>
>
>
> Are you able to share a test case at all? If so that would be great. If
> not, I can add an option to the allocator to dump abstract PBQP graphs, and
> I could use these to test the problem on my end.
>
>
>
> Regards,
>
> Lang.
>
>
>
>
>
> On Mon, Jan 26, 2015 at 7:55 AM, Jonas Paulsson <
> jonas.paulsson at ericsson.com> wrote:
>
> Hi,
>
>
>
> I have run into a test case on an out-of-tree target where PBQP fails to
> complete register allocation after “Attempting to spill already spilled
> value” (the triggered assert in InlineSpiller::spill().
>
>
>
> First, the original LiveInterval is spilled. It is a load of a symbol into
> a narrow register class, i.e. a subset of the class of address registers.
> InlineSpiller decides to rematerialize the load of the symbol to lie right
> before its only user, which makes good sense. The original def is removed.
>
>
>
> The new LiveInterval pushed is thus much smaller in the next PBQP round.
> The spill cost is marked as ‘inf’ during graph building. This small
> interval has also a lot of overlapping intervals and thus edges in the PBQP
> graph. It gets pushed on the node stack to later be popped after 17 others.
>
> Those 17 nodes use up all registers of the narrow reg-class, and the cost
> vector has become all infinities. Spill option is selected again, and thus
> the error is a fact of spilling an already spilled value.
>
>
>
> I wonder what has gone wrong here, and have some initial thoughts:
>
>
>
> * The problematic node that was spilled again, was in the
> ConservativelyAllocatableNodes set during reduce(). The comment in reduce()
> “Conservatively allocatable nodes will never spill…” indicates that perhaps
> this is an incorrect insertion, as the regs did in fact run out in this
> case.
>
>    In setup(), the node is first put into not-provably-allocatables.
> However, one of it’s neigbhour invoked handleDisconnectEdge(), and moves it
> into conservatively-allocatables, because DeniedOpts had become less than
> NumOpts (in isConservativelyAllocatable().
>
> * There are lots of spillable nodes being popped before the one that can’t
> be spilled.  This seems intuitively wrong, as they are intervals that
> actually could be spilled.
>
>
>
> I would really appreciate some help and pointers on what might be going
> wrong here,
>
>
>
> Jonas Paulsson
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150130/db8a7706/attachment.html>


More information about the llvm-dev mailing list