[LLVMdev] ScheduleDAGInstrs/R600 test potential issue with implicit defs

Tue Feb 25 07:38:46 PST 2014

Hello,

The ScheduleDAGInstrs::buildSchedGraph() function creates def/uses lists by iterating over all instruction operands and calls addPhysRegDeps() if used post-RA (line ~770 ff.). If an operand is a def, the uses of that registers are cleared (ScheduleDAGInstrs.cpp:333:  Uses.eraseAll(Reg); ).

As a consequence, if an instruction has an explicit use of a register and an implicit def of the same register, the implicit def causes the use to be removed. In our case this leads to missing scheduling dependencies for indirect calls where a return register is used as operand of the indirect call. Therefore I patched the buildSchedGraph() function to iterate over all defs first, then over all uses in a second iteration (maybe not the most efficient solution, but it solves the issue).
I am attaching the corresponding patch.

However, while upgrading our code to LLVM 3.4, I found that the test/CodeGen/R600/store.ll test fails for -mcpu=redwood due to this change. The difference is in the following line in @store_i8: With my patch applied I get:

    LSHL                  * T0.W, PV.W, literal.x,  
  3(4.203895e-45), 0(0.000000e+00)
    LSHL                  * T1.X, T1.W, PV.W,  
    LSHL                  * T1.W, literal.x, T0.W,  
  255(3.573311e-43), 0(0.000000e+00)

versus the orginal output of LLVM 3.4:

    LSHL                  * T0.W, PV.W, literal.x,  
  3(4.203895e-45), 0(0.000000e+00)
    LSHL                    T1.X, T1.W, PV.W,  
    LSHL                  * T1.W, literal.x, PV.W,  
  255(3.573311e-43), 0(0.000000e+00)

The difference is the '*' in the second LSHL and the T0.W versus PV.W in the third LSHL.

Looking at the output of -print-after-all, llc generates the following MC in both plain LLVM 3.4 and LLVM 3.4 with my patch applied right after the "Finalize machine instruction bundles" pass:

    %T0_W<def> = LSHL_eg 0, 0, 1, 0, 0, 0, %T0_W<kill>, 0, 0, 0, -1, %ALU_LITERAL_X, 0, 0, 0, -1, 1, pred:%PRED_SEL_OFF, 3, 0
    %T1_X<def> = LSHL_eg 0, 0, 1, 0, 0, 0, %T1_W<kill>, 0, 0, 0, -1, %T0_W, 0, 0, 0, -1, 1, pred:%PRED_SEL_OFF, 0, 0, %T1_XYZW<imp-def>
    %T1_W<def> = LSHL_eg 0, 0, 1, 0, 0, 0, %ALU_LITERAL_X, 0, 0, 0, -1, %T0_W<kill>, 0, 0, 0, -1, 1, pred:%PRED_SEL_OFF, 255, 0, %T1_XYZW<imp-use,kill>, %T1_XYZW<imp-def>

Hence, my patch affects either the R600 Packetizer or the R600 Control Flow Finalizer pass. A closer look reveals that the 17th operand (the one before the 'pred:$PRED_SEL_OFF' operand) controls the '*' output and is actually set for all three LSHL operands before the Packetizer/Control Flow Finalizer pass, and from a quick glance at the AMD ISA docs I gather that 'PV' versus 'T0' seems to be related to this, but the packetizer (?) clears this flag for the second LSHR for redwood/VLIW5 GPUs.

Now, here is my question: Is the R600/store.ll test actually correct (I am not yet familiar enough with the R600 ISA to tell), or does the R600 backend rely on the fact that implicit defs kill the uses of registers, or is there a bug/.. in the R600 backend that counteracts the implicit-defs scheduling issue? 
And is removing the explicit use by an implicit def at an instruction a bug or a feature?

If this is an actual bug in the ScheduleDAGInstrs implementation, I would like to adapt my patch so that it can be applied (i.e., include a fix for the store.ll test), although there might be a better way of doing this (e.g., only erase uses that are not from the same SU, or iterate over defs, implicit-defs, uses, implicit-uses instead of iterating over all operands twice), and I would need someone more familiar with the R600 backend to check if the new output of LLC for store.ll is actually correct or not.

Kind regards,
 Stefan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Fixed-implicit-defs-removing-uses-of-registers-when-.patch
Type: text/x-patch
Size: 3621 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140225/ce81fa1a/attachment.bin>