[LLVMdev] Scheduling question (memory dependency)

Fri Sep 21 11:57:17 PDT 2012

OK, finally found it.  The AliasChain in
ScheduleDAGInstrs::buildSchedGraph is not acting as a chain for loads
and stores (the head of the chain is not being updated as they are
encountered, so dependencies aren't being added solely on the basis of
may-aliasing in some cases).  Will test a patch.

On Fri, 2012-09-21 at 13:04 -0500, William J. Schmidt wrote:
> On Fri, 2012-09-21 at 11:34 -0500, William J. Schmidt wrote:
> > Hi Sergei,
> > 
> > Thanks for the response!  We just discovered there is likely a bug
> > happening during post-RA list scheduling.  There's an invalid successor
> > index in the scheduling graph that is probably supposed to be the
> > missing arc.  Starting to investigate further now.  This is recorded in
> > http://llvm.org/bugs/show_bug.cgi?id=13891.
> 
> That appears to have been a red herring; I believe the value of -1 is an
> artificial dependency indicating the scheduling barrier at the end of
> the group, or something along those lines.  The problem appears to be
> that the load and store both return a value from
> getUnderlyingObjectForInstr, but they are two different objects...
> 
> Thanks,
> Bill
> 
> > 
> > Thanks,
> > Bill
> > 
> > On Fri, 2012-09-21 at 11:15 -0500, Sergei Larin wrote:
> > > Hi Bill,
> > > 
> > >    Which scheduler do you use? MI or SDNode one? In either case the problem
> > > is likely the same, but cause might be in a different place...
> > > 
> > > The way I see it, you have an issue with the alias analyzer, not scheduler.
> > > When scheduling DAG is constructed, AA is checked for pairs of mem accessing
> > > objects, and if no potential interference is flagged by the AA the chain
> > > edge is _not_ inserted. If that decision is wrong, you will end up with a
> > > well hidden and randomly popping bugs.
> > > 
> > >   So the question much more likely is: Why AA sees these two objects as not
> > > aliasing, and are they properly described and presented to it?
> > > 
> > >   Does ld/bitcast has proper memory operands? Any flags on them? Is
> > > underlying memory object making sense?
> > > 
> > >   You can look at getUnderlyingObjectForInstr and MIsNeedChainEdge in the MI
> > > scheduling framework to see what I mean.
> > > 
> > >   If you are still using SDNode scheduling framework - it has a very similar
> > > functionality in a slightly different code.
> > > 
> > >   Hope this helps.
> > > 
> > > Sergei
> > > 
> > > ---
> > > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by
> > > The Linux Foundation
> > > 
> > > > -----Original Message-----
> > > > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu]
> > > > On Behalf Of William J. Schmidt
> > > > Sent: Friday, September 21, 2012 9:07 AM
> > > > To: llvmdev at cs.uiuc.edu
> > > > Subject: Re: [LLVMdev] Scheduling question (memory dependency)
> > > > 
> > > > Here's another data point that may be useful.  [Scheduling experts,
> > > > please help! :) ]
> > > > 
> > > > If the two-byte bitfield is replaced by a two-byte struct (replace
> > > > "short i:8" with "short i", etc.), the scheduler properly generates a
> > > > dependency between the store and the load.  For this case, a GEP is
> > > > used instead of a bitcast:
> > > > 
> > > > ------------------------------------------------------------------
> > > > define void @_Z5check3fooj(%struct.foo* nocapture byval %f, i32 %i)
> > > > noinline {
> > > > entry:
> > > >   %i1 = getelementptr inbounds %struct.foo* %f, i64 0, i32 0
> > > >   %0 = load i16* %i1, align 2, !tbaa !0
> > > > ------------------------------------------------------------------
> > > > 
> > > > One notable difference is the "!tbaa !0" decoration on the load.  I
> > > > don't know whether this helps or not.  Later the lowered instructions
> > > > look like:
> > > > 
> > > > ------------------------------------------------------------------
> > > > 16B		%vreg2<def> = COPY %X4; G8RC_with_sub_32:%vreg2
> > > > 32B		%vreg1<def> = COPY %X3; G8RC:%vreg1
> > > > 48B		STH8 %vreg1<kill>, 0, <fi#-1>; mem:ST2[FixedStack-1]
> > > > G8RC:%vreg1
> > > > 64B		%vreg0<def> = LHZ 0, <fi#-1>; mem:LD2[%i11] GPRC:%vreg0
> > > >                 ...
> > > > ------------------------------------------------------------------
> > > > 
> > > > Note the %i11 instead of %0 on the LHZ as another difference.  The
> > > > scheduler then generates a dependency between the store and the load,
> > > > and everything works properly.
> > > > 
> > > > Does this help tickle any memories?
> > > > 
> > > > Thanks,
> > > > Bill
> > > > 
> > > > 
> > > > On Thu, 2012-09-20 at 18:02 -0500, William J. Schmidt wrote:
> > > > > Greetings,
> > > > >
> > > > > I'm investigating a bug in the PowerPC back end in which a load from
> > > > a
> > > > > storage address is being reordered prior to a store to the same
> > > > > storage address.  I'm quite new to LLVM, so I would appreciate some
> > > > > help understanding what I'm seeing from the dumps.  I assume that
> > > > some
> > > > > information is missing that would represent the memory dependency,
> > > > but
> > > > > I don't know what form that should take.
> > > > >
> > > > > Example source code is as follows:
> > > > >
> > > > > ----------------------------------------------------------------
> > > > > extern "C" { int printf(const char *, ...); void exit(int);} struct
> > > > > foo {
> > > > >   short i:8;
> > > > > };
> > > > >
> > > > > void check(struct foo f, short i) __attribute__((noinline)) {
> > > > >   if (f.i != i) {
> > > > >     short fi = f.i;
> > > > >     printf("problem with %u != %u\n", fi, i);
> > > > >     exit(0);
> > > > >   }
> > > > > }
> > > > > ---------------------------------------------------------------
> > > > >
> > > > > The initial portion of the Clang output is:
> > > > >
> > > > > define void @_Z5check3foos(%struct.foo* nocapture byval %f, i16
> > > > > signext %i) noinline {
> > > > > entry:
> > > > >   %0 = bitcast %struct.foo* %f to i16*
> > > > >   %1 = load i16* %0, align 2
> > > > >   ...
> > > > > ---------------------------------------------------------------
> > > > >
> > > > > The code works OK at -O0.  At -O1, the first part of the generated
> > > > > code
> > > > > is:
> > > > >
> > > > > ---------------------------------------------------------------
> > > > > .L._Z5check3foos:
> > > > > 	.cfi_startproc
> > > > > # BB#0:                                 # %entry
> > > > > 	mflr 0
> > > > > 	std 0, 16(1)
> > > > > 	stdu 1, -112(1)
> > > > > .Ltmp1:
> > > > > 	.cfi_def_cfa_offset 112
> > > > > .Ltmp2:
> > > > > 	.cfi_offset lr, 16
> > > > > 	lha 5, 162(1)
> > > > > 	sth 3, 162(1)
> > > > >         ...
> > > > > ---------------------------------------------------------------
> > > > >
> > > > > The problem here is that the incoming parameter in register 3 is
> > > > > stored too late, after an attempt to load the value into register 5.
> > > > >
> > > > > Looking at dumps with -debug, I see the following:
> > > > >
> > > > > ---------------------------------------------------------------
> > > > > ********** MACHINEINSTRS **********
> > > > > # Machine code for function _Z5check3foos: Post SSA Frame Objects:
> > > > >   fi#-1: size=2, align=2, fixed, at location [SP+50] Function Live
> > > > > Ins: %X3 in %vreg1, %X4 in %vreg2
> > > > >
> > > > > 0B	BB#0: derived from LLVM BB %entry
> > > > > 	    Live Ins: %X3 %X4
> > > > > 16B		%vreg2<def> = COPY %X4; G8RC_with_sub_32:%vreg2
> > > > > 32B		%vreg1<def> = COPY %X3; G8RC:%vreg1
> > > > > 48B		STH8 %vreg1<kill>, 0, <fi#-1>; mem:ST2[FixedStack-1]
> > > > G8RC:%vreg1
> > > > > 64B		%vreg4<def> = LHA 0, <fi#-1>; mem:LD2[%0] GPRC:%vreg4
> > > > >                 ...
> > > > > ---------------------------------------------------------------
> > > > >
> > > > > So far, so good.  When we get to list scheduling, not quite so good:
> > > > >
> > > > > ---------------------------------------------------------------
> > > > > ********** List Scheduling **********
> > > > > SU(0):   STH8 %X3<kill>, 162, %X1; mem:ST2[FixedStack-1]
> > > > >   # preds left       : 0
> > > > >   # succs left       : 4
> > > > >   # rdefs left       : 0
> > > > >   Latency            : 3
> > > > >   Depth              : 0
> > > > >   Height             : 0
> > > > >   Successors:
> > > > >    antiSU(2): Latency=0
> > > > >    antiSU(2): Latency=0
> > > > >    ch  SU(5): Latency=0
> > > > >    ch  SU(4294967295) *: Latency=0
> > > > >
> > > > > SU(1):   %R5<def> = LHA 162, %X1; mem:LD2[%0]
> > > > >   # preds left       : 0
> > > > >   # succs left       : 3
> > > > >   # rdefs left       : 0
> > > > >   Latency            : 5
> > > > >   Depth              : 0
> > > > >   Height             : 0
> > > > >   Successors:
> > > > >    out SU(3): Latency=1
> > > > >    val SU(2): Latency=5
> > > > >    ch  SU(5): Latency=0
> > > > > ...
> > > > > ---------------------------------------------------------------
> > > > >
> > > > > There is no dependency expressed between these two memory operations,
> > > > > although they both access the stack address 162(X1).  The scheduler
> > > > > then sees both instructions as ready, and chooses the load based on
> > > > > critical path height:
> > > > >
> > > > > ---------------------------------------------------------------
> > > > > *** Examining Available
> > > > > Height 9: SU(1):   %R5<def> = LHA 162, %X1; mem:LD2[%0]
> > > > > Height 4: SU(0):   STH8 %X3<kill>, 162, %X1; mem:ST2[FixedStack-1]
> > > > > *** Scheduling [0]: SU(1):   %R5<def> = LHA 162, %X1; mem:LD2[%0]
> > > > > ---------------------------------------------------------------
> > > > >
> > > > > The obvious questions are:  Why is there no dependence between these
> > > > > two instructions?  And what needs to be done to ensure there is one?
> > > > > My guess is that we somehow need to unify FixedStack-1 with %0, but
> > > > > it's not clear to me how this would be accomplished.
> > > > >
> > > > > (The store is generated as part of SelectionDAGISel::LowerArguments
> > > > > from lib/CodeGen/SelectionDAG/SelectionDAGBuilder, using the
> > > > > PowerPC-specific code in lib/Target/PowerPC/PPCISelLowering.cpp.  The
> > > > > load is generated directly from the "load" in the LLVM IR at some
> > > > > other time.)
> > > > >
> > > > > Thanks very much for any help!
> > > > >
> > > > > Bill
> > > > >
> > > > 
> > > > 
> > > > _______________________________________________
> > > > LLVM Developers mailing list
> > > > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> > > 
> > 
>