[LLVMdev] Scheduling question (memory dependency)

William J. Schmidt wschmidt at linux.vnet.ibm.com
Fri Sep 21 09:34:23 PDT 2012


Hi Sergei,

Thanks for the response!  We just discovered there is likely a bug
happening during post-RA list scheduling.  There's an invalid successor
index in the scheduling graph that is probably supposed to be the
missing arc.  Starting to investigate further now.  This is recorded in
http://llvm.org/bugs/show_bug.cgi?id=13891.

Thanks,
Bill

On Fri, 2012-09-21 at 11:15 -0500, Sergei Larin wrote:
> Hi Bill,
> 
>    Which scheduler do you use? MI or SDNode one? In either case the problem
> is likely the same, but cause might be in a different place...
> 
> The way I see it, you have an issue with the alias analyzer, not scheduler.
> When scheduling DAG is constructed, AA is checked for pairs of mem accessing
> objects, and if no potential interference is flagged by the AA the chain
> edge is _not_ inserted. If that decision is wrong, you will end up with a
> well hidden and randomly popping bugs.
> 
>   So the question much more likely is: Why AA sees these two objects as not
> aliasing, and are they properly described and presented to it?
> 
>   Does ld/bitcast has proper memory operands? Any flags on them? Is
> underlying memory object making sense?
> 
>   You can look at getUnderlyingObjectForInstr and MIsNeedChainEdge in the MI
> scheduling framework to see what I mean.
> 
>   If you are still using SDNode scheduling framework - it has a very similar
> functionality in a slightly different code.
> 
>   Hope this helps.
> 
> Sergei
> 
> ---
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by
> The Linux Foundation
> 
> > -----Original Message-----
> > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu]
> > On Behalf Of William J. Schmidt
> > Sent: Friday, September 21, 2012 9:07 AM
> > To: llvmdev at cs.uiuc.edu
> > Subject: Re: [LLVMdev] Scheduling question (memory dependency)
> > 
> > Here's another data point that may be useful.  [Scheduling experts,
> > please help! :) ]
> > 
> > If the two-byte bitfield is replaced by a two-byte struct (replace
> > "short i:8" with "short i", etc.), the scheduler properly generates a
> > dependency between the store and the load.  For this case, a GEP is
> > used instead of a bitcast:
> > 
> > ------------------------------------------------------------------
> > define void @_Z5check3fooj(%struct.foo* nocapture byval %f, i32 %i)
> > noinline {
> > entry:
> >   %i1 = getelementptr inbounds %struct.foo* %f, i64 0, i32 0
> >   %0 = load i16* %i1, align 2, !tbaa !0
> > ------------------------------------------------------------------
> > 
> > One notable difference is the "!tbaa !0" decoration on the load.  I
> > don't know whether this helps or not.  Later the lowered instructions
> > look like:
> > 
> > ------------------------------------------------------------------
> > 16B		%vreg2<def> = COPY %X4; G8RC_with_sub_32:%vreg2
> > 32B		%vreg1<def> = COPY %X3; G8RC:%vreg1
> > 48B		STH8 %vreg1<kill>, 0, <fi#-1>; mem:ST2[FixedStack-1]
> > G8RC:%vreg1
> > 64B		%vreg0<def> = LHZ 0, <fi#-1>; mem:LD2[%i11] GPRC:%vreg0
> >                 ...
> > ------------------------------------------------------------------
> > 
> > Note the %i11 instead of %0 on the LHZ as another difference.  The
> > scheduler then generates a dependency between the store and the load,
> > and everything works properly.
> > 
> > Does this help tickle any memories?
> > 
> > Thanks,
> > Bill
> > 
> > 
> > On Thu, 2012-09-20 at 18:02 -0500, William J. Schmidt wrote:
> > > Greetings,
> > >
> > > I'm investigating a bug in the PowerPC back end in which a load from
> > a
> > > storage address is being reordered prior to a store to the same
> > > storage address.  I'm quite new to LLVM, so I would appreciate some
> > > help understanding what I'm seeing from the dumps.  I assume that
> > some
> > > information is missing that would represent the memory dependency,
> > but
> > > I don't know what form that should take.
> > >
> > > Example source code is as follows:
> > >
> > > ----------------------------------------------------------------
> > > extern "C" { int printf(const char *, ...); void exit(int);} struct
> > > foo {
> > >   short i:8;
> > > };
> > >
> > > void check(struct foo f, short i) __attribute__((noinline)) {
> > >   if (f.i != i) {
> > >     short fi = f.i;
> > >     printf("problem with %u != %u\n", fi, i);
> > >     exit(0);
> > >   }
> > > }
> > > ---------------------------------------------------------------
> > >
> > > The initial portion of the Clang output is:
> > >
> > > define void @_Z5check3foos(%struct.foo* nocapture byval %f, i16
> > > signext %i) noinline {
> > > entry:
> > >   %0 = bitcast %struct.foo* %f to i16*
> > >   %1 = load i16* %0, align 2
> > >   ...
> > > ---------------------------------------------------------------
> > >
> > > The code works OK at -O0.  At -O1, the first part of the generated
> > > code
> > > is:
> > >
> > > ---------------------------------------------------------------
> > > .L._Z5check3foos:
> > > 	.cfi_startproc
> > > # BB#0:                                 # %entry
> > > 	mflr 0
> > > 	std 0, 16(1)
> > > 	stdu 1, -112(1)
> > > .Ltmp1:
> > > 	.cfi_def_cfa_offset 112
> > > .Ltmp2:
> > > 	.cfi_offset lr, 16
> > > 	lha 5, 162(1)
> > > 	sth 3, 162(1)
> > >         ...
> > > ---------------------------------------------------------------
> > >
> > > The problem here is that the incoming parameter in register 3 is
> > > stored too late, after an attempt to load the value into register 5.
> > >
> > > Looking at dumps with -debug, I see the following:
> > >
> > > ---------------------------------------------------------------
> > > ********** MACHINEINSTRS **********
> > > # Machine code for function _Z5check3foos: Post SSA Frame Objects:
> > >   fi#-1: size=2, align=2, fixed, at location [SP+50] Function Live
> > > Ins: %X3 in %vreg1, %X4 in %vreg2
> > >
> > > 0B	BB#0: derived from LLVM BB %entry
> > > 	    Live Ins: %X3 %X4
> > > 16B		%vreg2<def> = COPY %X4; G8RC_with_sub_32:%vreg2
> > > 32B		%vreg1<def> = COPY %X3; G8RC:%vreg1
> > > 48B		STH8 %vreg1<kill>, 0, <fi#-1>; mem:ST2[FixedStack-1]
> > G8RC:%vreg1
> > > 64B		%vreg4<def> = LHA 0, <fi#-1>; mem:LD2[%0] GPRC:%vreg4
> > >                 ...
> > > ---------------------------------------------------------------
> > >
> > > So far, so good.  When we get to list scheduling, not quite so good:
> > >
> > > ---------------------------------------------------------------
> > > ********** List Scheduling **********
> > > SU(0):   STH8 %X3<kill>, 162, %X1; mem:ST2[FixedStack-1]
> > >   # preds left       : 0
> > >   # succs left       : 4
> > >   # rdefs left       : 0
> > >   Latency            : 3
> > >   Depth              : 0
> > >   Height             : 0
> > >   Successors:
> > >    antiSU(2): Latency=0
> > >    antiSU(2): Latency=0
> > >    ch  SU(5): Latency=0
> > >    ch  SU(4294967295) *: Latency=0
> > >
> > > SU(1):   %R5<def> = LHA 162, %X1; mem:LD2[%0]
> > >   # preds left       : 0
> > >   # succs left       : 3
> > >   # rdefs left       : 0
> > >   Latency            : 5
> > >   Depth              : 0
> > >   Height             : 0
> > >   Successors:
> > >    out SU(3): Latency=1
> > >    val SU(2): Latency=5
> > >    ch  SU(5): Latency=0
> > > ...
> > > ---------------------------------------------------------------
> > >
> > > There is no dependency expressed between these two memory operations,
> > > although they both access the stack address 162(X1).  The scheduler
> > > then sees both instructions as ready, and chooses the load based on
> > > critical path height:
> > >
> > > ---------------------------------------------------------------
> > > *** Examining Available
> > > Height 9: SU(1):   %R5<def> = LHA 162, %X1; mem:LD2[%0]
> > > Height 4: SU(0):   STH8 %X3<kill>, 162, %X1; mem:ST2[FixedStack-1]
> > > *** Scheduling [0]: SU(1):   %R5<def> = LHA 162, %X1; mem:LD2[%0]
> > > ---------------------------------------------------------------
> > >
> > > The obvious questions are:  Why is there no dependence between these
> > > two instructions?  And what needs to be done to ensure there is one?
> > > My guess is that we somehow need to unify FixedStack-1 with %0, but
> > > it's not clear to me how this would be accomplished.
> > >
> > > (The store is generated as part of SelectionDAGISel::LowerArguments
> > > from lib/CodeGen/SelectionDAG/SelectionDAGBuilder, using the
> > > PowerPC-specific code in lib/Target/PowerPC/PPCISelLowering.cpp.  The
> > > load is generated directly from the "load" in the LLVM IR at some
> > > other time.)
> > >
> > > Thanks very much for any help!
> > >
> > > Bill
> > >
> > 
> > 
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 





More information about the llvm-dev mailing list