[LLVMdev] Scheduling question (memory dependency)
William J. Schmidt
wschmidt at linux.vnet.ibm.com
Fri Sep 21 11:57:17 PDT 2012
OK, finally found it. The AliasChain in
ScheduleDAGInstrs::buildSchedGraph is not acting as a chain for loads
and stores (the head of the chain is not being updated as they are
encountered, so dependencies aren't being added solely on the basis of
may-aliasing in some cases). Will test a patch.
On Fri, 2012-09-21 at 13:04 -0500, William J. Schmidt wrote:
> On Fri, 2012-09-21 at 11:34 -0500, William J. Schmidt wrote:
> > Hi Sergei,
> >
> > Thanks for the response! We just discovered there is likely a bug
> > happening during post-RA list scheduling. There's an invalid successor
> > index in the scheduling graph that is probably supposed to be the
> > missing arc. Starting to investigate further now. This is recorded in
> > http://llvm.org/bugs/show_bug.cgi?id=13891.
>
> That appears to have been a red herring; I believe the value of -1 is an
> artificial dependency indicating the scheduling barrier at the end of
> the group, or something along those lines. The problem appears to be
> that the load and store both return a value from
> getUnderlyingObjectForInstr, but they are two different objects...
>
> Thanks,
> Bill
>
> >
> > Thanks,
> > Bill
> >
> > On Fri, 2012-09-21 at 11:15 -0500, Sergei Larin wrote:
> > > Hi Bill,
> > >
> > > Which scheduler do you use? MI or SDNode one? In either case the problem
> > > is likely the same, but cause might be in a different place...
> > >
> > > The way I see it, you have an issue with the alias analyzer, not scheduler.
> > > When scheduling DAG is constructed, AA is checked for pairs of mem accessing
> > > objects, and if no potential interference is flagged by the AA the chain
> > > edge is _not_ inserted. If that decision is wrong, you will end up with a
> > > well hidden and randomly popping bugs.
> > >
> > > So the question much more likely is: Why AA sees these two objects as not
> > > aliasing, and are they properly described and presented to it?
> > >
> > > Does ld/bitcast has proper memory operands? Any flags on them? Is
> > > underlying memory object making sense?
> > >
> > > You can look at getUnderlyingObjectForInstr and MIsNeedChainEdge in the MI
> > > scheduling framework to see what I mean.
> > >
> > > If you are still using SDNode scheduling framework - it has a very similar
> > > functionality in a slightly different code.
> > >
> > > Hope this helps.
> > >
> > > Sergei
> > >
> > > ---
> > > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by
> > > The Linux Foundation
> > >
> > > > -----Original Message-----
> > > > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu]
> > > > On Behalf Of William J. Schmidt
> > > > Sent: Friday, September 21, 2012 9:07 AM
> > > > To: llvmdev at cs.uiuc.edu
> > > > Subject: Re: [LLVMdev] Scheduling question (memory dependency)
> > > >
> > > > Here's another data point that may be useful. [Scheduling experts,
> > > > please help! :) ]
> > > >
> > > > If the two-byte bitfield is replaced by a two-byte struct (replace
> > > > "short i:8" with "short i", etc.), the scheduler properly generates a
> > > > dependency between the store and the load. For this case, a GEP is
> > > > used instead of a bitcast:
> > > >
> > > > ------------------------------------------------------------------
> > > > define void @_Z5check3fooj(%struct.foo* nocapture byval %f, i32 %i)
> > > > noinline {
> > > > entry:
> > > > %i1 = getelementptr inbounds %struct.foo* %f, i64 0, i32 0
> > > > %0 = load i16* %i1, align 2, !tbaa !0
> > > > ------------------------------------------------------------------
> > > >
> > > > One notable difference is the "!tbaa !0" decoration on the load. I
> > > > don't know whether this helps or not. Later the lowered instructions
> > > > look like:
> > > >
> > > > ------------------------------------------------------------------
> > > > 16B %vreg2<def> = COPY %X4; G8RC_with_sub_32:%vreg2
> > > > 32B %vreg1<def> = COPY %X3; G8RC:%vreg1
> > > > 48B STH8 %vreg1<kill>, 0, <fi#-1>; mem:ST2[FixedStack-1]
> > > > G8RC:%vreg1
> > > > 64B %vreg0<def> = LHZ 0, <fi#-1>; mem:LD2[%i11] GPRC:%vreg0
> > > > ...
> > > > ------------------------------------------------------------------
> > > >
> > > > Note the %i11 instead of %0 on the LHZ as another difference. The
> > > > scheduler then generates a dependency between the store and the load,
> > > > and everything works properly.
> > > >
> > > > Does this help tickle any memories?
> > > >
> > > > Thanks,
> > > > Bill
> > > >
> > > >
> > > > On Thu, 2012-09-20 at 18:02 -0500, William J. Schmidt wrote:
> > > > > Greetings,
> > > > >
> > > > > I'm investigating a bug in the PowerPC back end in which a load from
> > > > a
> > > > > storage address is being reordered prior to a store to the same
> > > > > storage address. I'm quite new to LLVM, so I would appreciate some
> > > > > help understanding what I'm seeing from the dumps. I assume that
> > > > some
> > > > > information is missing that would represent the memory dependency,
> > > > but
> > > > > I don't know what form that should take.
> > > > >
> > > > > Example source code is as follows:
> > > > >
> > > > > ----------------------------------------------------------------
> > > > > extern "C" { int printf(const char *, ...); void exit(int);} struct
> > > > > foo {
> > > > > short i:8;
> > > > > };
> > > > >
> > > > > void check(struct foo f, short i) __attribute__((noinline)) {
> > > > > if (f.i != i) {
> > > > > short fi = f.i;
> > > > > printf("problem with %u != %u\n", fi, i);
> > > > > exit(0);
> > > > > }
> > > > > }
> > > > > ---------------------------------------------------------------
> > > > >
> > > > > The initial portion of the Clang output is:
> > > > >
> > > > > define void @_Z5check3foos(%struct.foo* nocapture byval %f, i16
> > > > > signext %i) noinline {
> > > > > entry:
> > > > > %0 = bitcast %struct.foo* %f to i16*
> > > > > %1 = load i16* %0, align 2
> > > > > ...
> > > > > ---------------------------------------------------------------
> > > > >
> > > > > The code works OK at -O0. At -O1, the first part of the generated
> > > > > code
> > > > > is:
> > > > >
> > > > > ---------------------------------------------------------------
> > > > > .L._Z5check3foos:
> > > > > .cfi_startproc
> > > > > # BB#0: # %entry
> > > > > mflr 0
> > > > > std 0, 16(1)
> > > > > stdu 1, -112(1)
> > > > > .Ltmp1:
> > > > > .cfi_def_cfa_offset 112
> > > > > .Ltmp2:
> > > > > .cfi_offset lr, 16
> > > > > lha 5, 162(1)
> > > > > sth 3, 162(1)
> > > > > ...
> > > > > ---------------------------------------------------------------
> > > > >
> > > > > The problem here is that the incoming parameter in register 3 is
> > > > > stored too late, after an attempt to load the value into register 5.
> > > > >
> > > > > Looking at dumps with -debug, I see the following:
> > > > >
> > > > > ---------------------------------------------------------------
> > > > > ********** MACHINEINSTRS **********
> > > > > # Machine code for function _Z5check3foos: Post SSA Frame Objects:
> > > > > fi#-1: size=2, align=2, fixed, at location [SP+50] Function Live
> > > > > Ins: %X3 in %vreg1, %X4 in %vreg2
> > > > >
> > > > > 0B BB#0: derived from LLVM BB %entry
> > > > > Live Ins: %X3 %X4
> > > > > 16B %vreg2<def> = COPY %X4; G8RC_with_sub_32:%vreg2
> > > > > 32B %vreg1<def> = COPY %X3; G8RC:%vreg1
> > > > > 48B STH8 %vreg1<kill>, 0, <fi#-1>; mem:ST2[FixedStack-1]
> > > > G8RC:%vreg1
> > > > > 64B %vreg4<def> = LHA 0, <fi#-1>; mem:LD2[%0] GPRC:%vreg4
> > > > > ...
> > > > > ---------------------------------------------------------------
> > > > >
> > > > > So far, so good. When we get to list scheduling, not quite so good:
> > > > >
> > > > > ---------------------------------------------------------------
> > > > > ********** List Scheduling **********
> > > > > SU(0): STH8 %X3<kill>, 162, %X1; mem:ST2[FixedStack-1]
> > > > > # preds left : 0
> > > > > # succs left : 4
> > > > > # rdefs left : 0
> > > > > Latency : 3
> > > > > Depth : 0
> > > > > Height : 0
> > > > > Successors:
> > > > > antiSU(2): Latency=0
> > > > > antiSU(2): Latency=0
> > > > > ch SU(5): Latency=0
> > > > > ch SU(4294967295) *: Latency=0
> > > > >
> > > > > SU(1): %R5<def> = LHA 162, %X1; mem:LD2[%0]
> > > > > # preds left : 0
> > > > > # succs left : 3
> > > > > # rdefs left : 0
> > > > > Latency : 5
> > > > > Depth : 0
> > > > > Height : 0
> > > > > Successors:
> > > > > out SU(3): Latency=1
> > > > > val SU(2): Latency=5
> > > > > ch SU(5): Latency=0
> > > > > ...
> > > > > ---------------------------------------------------------------
> > > > >
> > > > > There is no dependency expressed between these two memory operations,
> > > > > although they both access the stack address 162(X1). The scheduler
> > > > > then sees both instructions as ready, and chooses the load based on
> > > > > critical path height:
> > > > >
> > > > > ---------------------------------------------------------------
> > > > > *** Examining Available
> > > > > Height 9: SU(1): %R5<def> = LHA 162, %X1; mem:LD2[%0]
> > > > > Height 4: SU(0): STH8 %X3<kill>, 162, %X1; mem:ST2[FixedStack-1]
> > > > > *** Scheduling [0]: SU(1): %R5<def> = LHA 162, %X1; mem:LD2[%0]
> > > > > ---------------------------------------------------------------
> > > > >
> > > > > The obvious questions are: Why is there no dependence between these
> > > > > two instructions? And what needs to be done to ensure there is one?
> > > > > My guess is that we somehow need to unify FixedStack-1 with %0, but
> > > > > it's not clear to me how this would be accomplished.
> > > > >
> > > > > (The store is generated as part of SelectionDAGISel::LowerArguments
> > > > > from lib/CodeGen/SelectionDAG/SelectionDAGBuilder, using the
> > > > > PowerPC-specific code in lib/Target/PowerPC/PPCISelLowering.cpp. The
> > > > > load is generated directly from the "load" in the LLVM IR at some
> > > > > other time.)
> > > > >
> > > > > Thanks very much for any help!
> > > > >
> > > > > Bill
> > > > >
> > > >
> > > >
> > > > _______________________________________________
> > > > LLVM Developers mailing list
> > > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> > > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> > >
> >
>
More information about the llvm-dev
mailing list