[LLVMdev] ScheduleDAGInstrs/R600 test potential issue with implicit defs

Hal Finkel hfinkel at anl.gov
Tue Feb 25 10:55:06 PST 2014


----- Original Message -----
> From: "Stefan Hepp" <stefan at stefant.org>
> To: "Tom Stellard" <tom at stellard.net>
> Cc: "llvmdev at cs.uiuc.edu List" <llvmdev at cs.uiuc.edu>
> Sent: Tuesday, February 25, 2014 12:22:54 PM
> Subject: Re: [LLVMdev] ScheduleDAGInstrs/R600 test potential issue with implicit defs
> 
> Hi Tom,
> 
> Thanks a lot for your explanations, now it makes a lot more sense ;)
> 
> I had a slightly closer look at the R600 packetizer, and the issue is
> that the third LSHL instruction has both an implicit use and
> *afterwards* an implicit def of T1_XYZW. The latter def causes the
> current ScheduleDAGInstrs implementation to ignore the implicit use,
> thus the ScheduleDAG only contains an anti-dependency from the second
> to
> the third LSHL and the packetizer can bundle the instructions.
> 
> If the order of the implicit-defs and implicit-use would be different
> (e.g., like TableGen adds them), or if I apply my patch so that the
> implicit-use is not ignored, there is a true dependency edge between
> those two instructions, preventing bundling.
> 
> Removing only the implicit-defs alone has no effect in this example,
> as
> then the implicit-use of T1_XYZW is then no longer removed and a true
> dependency to the def of T1_W is added. Removing all implicit
> operands
> however solves that (see attached patch/hack).
> 
> I would argue that the behaviour of
> ScheduleDAGInstrs::buildSchedGraph
> for implicit operands is not correct, simply because changing the
> order
> of the operands would already prevent the R600 packetizer from
> bundling
> in this example.

Unless there is supposed to be an ordering, I agree that this does not make sense.

 -Hal

> 
> Cheers,
>   Stefan
> 
> On 2014-02-25 17:35, Tom Stellard wrote:
> > On Tue, Feb 25, 2014 at 04:38:46PM +0100, Stefan Hepp wrote:
> >> Hello,
> >>
> >> The ScheduleDAGInstrs::buildSchedGraph() function creates def/uses
> >> lists by iterating over all instruction operands and calls
> >> addPhysRegDeps() if used post-RA (line ~770 ff.). If an operand
> >> is a def, the uses of that registers are cleared
> >> (ScheduleDAGInstrs.cpp:333:  Uses.eraseAll(Reg); ).
> >>
> >> As a consequence, if an instruction has an explicit use of a
> >> register and an implicit def of the same register, the implicit
> >> def causes the use to be removed. In our case this leads to
> >> missing scheduling dependencies for indirect calls where a return
> >> register is used as operand of the indirect call. Therefore I
> >> patched the buildSchedGraph() function to iterate over all defs
> >> first, then over all uses in a second iteration (maybe not the
> >> most efficient solution, but it solves the issue).
> >> I am attaching the corresponding patch.
> >>
> >> However, while upgrading our code to LLVM 3.4, I found that the
> >> test/CodeGen/R600/store.ll test fails for -mcpu=redwood due to
> >> this change. The difference is in the following line in
> >> @store_i8: With my patch applied I get:
> >>
> >>      LSHL                  * T0.W, PV.W, literal.x,
> >>    3(4.203895e-45), 0(0.000000e+00)
> >>      LSHL                  * T1.X, T1.W, PV.W,
> >>      LSHL                  * T1.W, literal.x, T0.W,
> >>    255(3.573311e-43), 0(0.000000e+00)
> >>
> >> versus the orginal output of LLVM 3.4:
> >>
> >>      LSHL                  * T0.W, PV.W, literal.x,
> >>    3(4.203895e-45), 0(0.000000e+00)
> >>      LSHL                    T1.X, T1.W, PV.W,
> >>      LSHL                  * T1.W, literal.x, PV.W,
> >>    255(3.573311e-43), 0(0.000000e+00)
> >>
> >> The difference is the '*' in the second LSHL and the T0.W versus
> >> PV.W in the third LSHL.
> >>
> >> Looking at the output of -print-after-all, llc generates the
> >> following MC in both plain LLVM 3.4 and LLVM 3.4 with my patch
> >> applied right after the "Finalize machine instruction bundles"
> >> pass:
> >>
> >>      %T0_W<def> = LSHL_eg 0, 0, 1, 0, 0, 0, %T0_W<kill>, 0, 0, 0,
> >>      -1, %ALU_LITERAL_X, 0, 0, 0, -1, 1, pred:%PRED_SEL_OFF, 3, 0
> >>      %T1_X<def> = LSHL_eg 0, 0, 1, 0, 0, 0, %T1_W<kill>, 0, 0, 0,
> >>      -1, %T0_W, 0, 0, 0, -1, 1, pred:%PRED_SEL_OFF, 0, 0,
> >>      %T1_XYZW<imp-def>
> >>      %T1_W<def> = LSHL_eg 0, 0, 1, 0, 0, 0, %ALU_LITERAL_X, 0, 0,
> >>      0, -1, %T0_W<kill>, 0, 0, 0, -1, 1, pred:%PRED_SEL_OFF, 255,
> >>      0, %T1_XYZW<imp-use,kill>, %T1_XYZW<imp-def>
> >>
> >> Hence, my patch affects either the R600 Packetizer or the R600
> >> Control Flow Finalizer pass. A closer look reveals that the 17th
> >> operand (the one before the 'pred:$PRED_SEL_OFF' operand)
> >> controls the '*' output and is actually set for all three LSHL
> >> operands before the Packetizer/Control Flow Finalizer pass, and
> >> from a quick glance at the AMD ISA docs I gather that 'PV' versus
> >> 'T0' seems to be related to this, but the packetizer (?) clears
> >> this flag for the second LSHR for redwood/VLIW5 GPUs.
> >>
> >>
> >> Now, here is my question: Is the R600/store.ll test actually
> >> correct (I am not yet familiar enough with the R600 ISA to tell),
> >> or does the R600 backend rely on the fact that implicit defs kill
> >> the uses of registers, or is there a bug/.. in the R600 backend
> >> that counteracts the implicit-defs scheduling issue?
> >> And is removing the explicit use by an implicit def at an
> >> instruction a bug or a feature?
> >>
> >> If this is an actual bug in the ScheduleDAGInstrs implementation,
> >> I would like to adapt my patch so that it can be applied (i.e.,
> >> include a fix for the store.ll test), although there might be a
> >> better way of doing this (e.g., only erase uses that are not from
> >> the same SU, or iterate over defs, implicit-defs, uses,
> >> implicit-uses instead of iterating over all operands twice), and
> >> I would need someone more familiar with the R600 backend to check
> >> if the new output of LLC for store.ll is actually correct or not.
> >>
> >
> > Hi Stefan,
> >
> > Thanks for the detailed analysis.  The '*' marks the end of a
> > packet, so
> > with your patch, the sequence uses 3 packets, and it uses only 2
> > with
> > LLVM 3.4.  So, this is a performance regression.
> >
> > The root of this problem is that LLVM assumes it is impossible to
> > concurrently access two sub-registers of the same super-register,
> > so
> > it creates a scheduling dependency between the two sub-register
> > defs.
> >
> > This is a problem with R600's VLIW architecture, because we want to
> > be
> > able to access all four sub-registers of the same packet, but
> > scheduling
> > dependencies prevent this.
> >
> > This is why there are implicit uses and defs of the T*_XYZW
> > registers
> > added to these instructions.  Some of these are added by LLVM
> > passes
> > and I think some are added by the R600MachineScheduler to work
> > around
> > the issue described (I'm not that familiar with the scheduler, so I
> > may
> > be wrong about this).
> >
> > I think there was an effort recently to try allow concurrent
> > sub-register accesses, but I don't remember who was working on
> > this.
> >
> > In this case, it looks like problem is that the last two
> > instructions
> > both define the same super-register, so the packetizer isn't
> > putting
> > them in the same packet.
> >
> > So, I think the conclusion is that your patch increases the
> > exposure of
> > the R600 to an already present problem in the register allocator /
> > scheduler.  I wonder if we could work around this by stripping all
> > the
> > implicit defs and uses off the instructions once scheduling and
> > register
> > allocation has completed.  They are technically incorrect, and I'm
> > not
> > sure if any other passes use them.  Do you have any suggestions?
> >
> > Thanks,
> > Tom
> >
> >
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory



More information about the llvm-dev mailing list