[llvm-commits] PATCH: Fix for PR13378 (I think...), and a lot of questions about how 2addr pass works. =]

Jakob Stoklund Olesen stoklund at 2pi.dk
Wed Jul 18 08:41:30 PDT 2012


On Jul 18, 2012, at 3:38 AM, Chandler Carruth <chandlerc at gmail.com> wrote:
> 
> (In reply to comment #1)
> > RescheduleMIBelowKill is moving an INSERT_SUBREG instruction, and then it never
> > gets revisited, so we don't insert the required COPY in front of it.
> 
> So, you piqued my curiosity, and I dug. ;] I think I have a clear understanding of what happens, and what is wrong about the end result, and how to fix it. However, I'm new to this code, so I'll explain in (perhaps way more detail than you wanted) how I got here. Hopefully, you can stop me if I've missed anything or gotten stuff wrong.

It looks like you got it right ;)

> When we process this with the twoaddr pass, we eventually try to reschedule the INSERT_SUBREG below a kill. When we start, the MBB looks like this:
> 
> BB#1: derived from LLVM BB %3
>     Predecessors according to CFG: BB#0
>         %vreg2<def> = VMOVv4i32 0, pred:14, pred:%noreg; QPR:%vreg2
>         %vreg3<def> = VLDMQIA %vreg4<undef>, pred:14, pred:%noreg; mem:LD16[undef](align=8) QPR_VFP2:%vreg3 GPR:%vreg4
>         %vreg5<def> = FCONSTS 112, pred:14, pred:%noreg; SPR:%vreg5
>         VSTMQIA %vreg2<kill>, %vreg7<undef>, pred:14, pred:%noreg; mem:ST16[undef](align=8) QPR:%vreg2 GPR:%vreg7
>         %vreg8<def> = INSERT_SUBREG %vreg3, %vreg5<kill>, ssub_3; QPR_VFP2:%vreg8,%vreg3 SPR:%vreg5
>         VSTMQIA %vreg3<kill>, %vreg9<undef>, pred:14, pred:%noreg; mem:ST16[undef](align=8) QPR_VFP2:%vreg3 GPR:%vreg9
>         VSTMQIA %vreg8<kill>, %vreg10<undef>, pred:14, pred:%noreg; mem:ST16[undef](align=8) QPR_VFP2:%vreg8 GPR:%vreg10
> 
> 
> This succeeds at rescheduling the INSERT_SUBREG below vreg3's kill. Which seems a nice, reasonable thing to do. The result is:
> 
> BB#1: derived from LLVM BB %3
>     Predecessors according to CFG: BB#0
>         %vreg2<def> = VMOVv4i32 0, pred:14, pred:%noreg; QPR:%vreg2
>         %vreg3<def> = VLDMQIA %vreg4<undef>, pred:14, pred:%noreg; mem:LD16[undef](align=8) QPR_VFP2:%vreg3 GPR:%vreg4
>         %vreg5<def> = FCONSTS 112, pred:14, pred:%noreg; SPR:%vreg5
>         VSTMQIA %vreg2<kill>, %vreg7<undef>, pred:14, pred:%noreg; mem:ST16[undef](align=8) QPR:%vreg2 GPR:%vreg7
>         VSTMQIA %vreg3, %vreg9<undef>, pred:14, pred:%noreg; mem:ST16[undef](align=8) QPR_VFP2:%vreg3 GPR:%vreg9
>         %vreg8<def> = INSERT_SUBREG %vreg3<kill>, %vreg5<kill>, ssub_3; QPR_VFP2:%vreg8,%vreg3 SPR:%vreg5
>         VSTMQIA %vreg8<kill>, %vreg10<undef>, pred:14, pred:%noreg; mem:ST16[undef](align=8) QPR_VFP2:%vreg8 GPR:%vreg10
> 
> As far as I can tell, at this point, everything is correct, and happy.

Right on.

> We then carry on and *do* hit the section which rewrites INSERT_SUBREG as a COPY node around line 1607 of TwoAddressInstructionPass.cpp. So, when we start this, we have the following INSERT_SUBREG:
> 
>   %vreg8<def> = INSERT_SUBREG %vreg3<kill>, %vreg5<kill>, ssub_3; QPR_VFP2:%vreg8,%vreg3 SPR:%vreg5
> 
> and we rewrite it to:
> 
>   %vreg8:ssub_3<def> = COPY %vreg5<kill>; QPR_VFP2:%vreg8 SPR:%vreg5
> 
> But I don't see how this could be a correct transform. %vreg3 had some value coming into the INSERT_SUBREG, and we're supposed to be copying that value into %vreg8, and then coping %vreg5 over top of the subreg indexed by 'ssub_3' (18 in this case).
> 
> Note that we leave %vreg8 as <def>'ed by this operation! This in turn seems to quite reasonably lead to the assertion as we have an instruction that defs a vreg and also reads from it. Worse, it's *not* reading from the vreg with the actual values, %vreg3.

Right. The transformation of INSERT_SUBREG is supposed to happen in two steps: First, insert a copy in front and rewrite the tied use operand:

  %vreg8<def> = COPY %vreg3<kill>; QPR_VFP2:%vreg8,%vreg3
  %vreg8<def> = INSERT_SUBREG %vreg8<kill>, %vreg5<kill>, ssub_3; QPR_VFP2:%vreg8 SPR:%vreg5

This is just like any other 2-addr instruction. The second step changes the INSERT_SUBREG to a sub-register COPY, something that is only allowed after we left SSA form:

  %vreg8<def> = COPY %vreg3<kill>; QPR_VFP2:%vreg8,%vreg3
  %vreg8:ssub_3<def> = COPY %vreg5<kill>; QPR_VFP2:%vreg8 SPR:%vreg5

The problem, as you correctly identified, is that step 2 happens even when step 1 was skipped because the instruction is rescheduled.

There is a twist to the plot that is quite common with INSERT_SUBREG: The source register could be <undef>:

  %vreg8<def> = INSERT_SUBREG %vreg3<kill,undef>, %vreg5<kill>, ssub_3; QPR_VFP2:%vreg8,%vreg3 SPR:%vreg5

This is in fact the most common form of INSERT_SUBREG - it is used for anyext nodes in the selection DAG. ARMInstrNEON.td creates tons of these when we do f32 arithmetic in the low part of v2f32 registers, for example.

When the source register is <undef>, we bypass the TiedOperandMap data structure, and simply rewrite the instruction on the fly:

  %vreg8<def> = INSERT_SUBREG %vreg8<kill,undef>, %vreg5<kill>, ssub_3; QPR_VFP2:%vreg8 SPR:%vreg5

There is no COPY inserted for the <undef> source. Later, INSERT_SUBREG is rewritten to a sub-register COPY:

  %vreg8:ssub_3<def,read-undef> = COPY %vreg5<kill>; QPR_VFP2:%vreg8 SPR:%vreg5

Note the <def,read-undef> flags on the first operand. That indicates that the sub-register def doesn't read the previous value of %vreg8.

> Ok, so what should the fix be? 
> 
> You mentioned that we don't insert the required COPY in front of it. I assume you're referring to the COPY %vreg3 to %vreg8, giving a proper <def>, and then allow the subreg COPY to clobber the subreg in question? Yes, reading the tied operands processing section, that is the point of the meat of this loop -- to inject copies (dropping SSA form) and then allow operations to occur in-place. It all makes sense now!
> 
> Hmm, so *normally*, what happens when we reschedule this is that we leave the "next" iterator pointing at the next instruction, and since we rescheduled *down* the MBB, we will re-process that instruction! This second time around, we'll be able to insert the needed copy.
> 
> This in turn leads me to the bug. When we transform the instruction by rescheduling, we break out of the loop over tied operands on line 1473 claiming that they have been eliminated. That's a bit of a lie, they've been either eliminated, or pushed further down in the basic block where we can re-evaluate them. That break drops us to line 1606, which is in the same cycle of the loop!!! Here is the bug. Now 'mi' still points to the INSERT_SUBREG, but we haven't actually processed it into two-address form yet, and we have fixed up the loop so that we'll reprocess it again later. Because of this, the rewrite occurs when everything is not set up properly.

OK, I see what happened now. The INSERT_SUBREG rewriting code used to be inside the TiedOperandMap loop, but I moved it outside the loop in r159120 when I added support for <undef> sources. The TiedOperandMap loop is empty in that case. My bad.

> Amusingly, we *do* re-process this instruction. But by then, it's a COPY, and doesn't even refer to the source register. Doh!

The special handling of INSERT_SUBREG could be removed from the main 2-addr loop. It behaves like any other 2-addr instruction with respect to COPY insertion and operand rewriting. The INSERT_SUBREG -> COPY transformation could be done after the fact, just like we do for REG_SEQUENCE instructions.

Instructions like INSERT_SUBREG and REG_SEQUENCE only exist to make SSA form happy. The 2-addr pass serves as a more general de-SSA pass when it rewrites these instructions into sub-register copies. It is convenient for the later passes to deal with only COPY instructions.

> OK, so the fix is logically straightforward: the break on line 1473 really needs to set 'mi' to 'nmi', and continue the for loop one layer out. But that is a bit hard to actually write in the current formation. I've attached a patch that I think just does it. =] It's gross, but the gross is the same gross as the rest of this function. It just needs massive refactoring. If you like this fix, I'll add the test case into the test suite, and commit.
> 
> Let me know if you'd like me to break out a big refactoring axe and start slicing this thing into a separate functions. I could probably spare a few hours to just make no-functionality changed refactoring patches and get this into at least a somewhat cleaner state.

That would be awesome. The long, nested loops with break and continues are particularly gnarly.

If the logic becomes simpler, you could collect INSERT_SUBREG instructions in a vector, and give them their special treatment after you're done with each basic block.

> <pr13378.patch>

Thanks for fixing this! The patch looks good.

/jakob





More information about the llvm-commits mailing list