[LLVMdev] Scheduling question (memory dependency)
William J. Schmidt
wschmidt at linux.vnet.ibm.com
Fri Sep 21 09:34:23 PDT 2012
Hi Sergei,
Thanks for the response! We just discovered there is likely a bug
happening during post-RA list scheduling. There's an invalid successor
index in the scheduling graph that is probably supposed to be the
missing arc. Starting to investigate further now. This is recorded in
http://llvm.org/bugs/show_bug.cgi?id=13891.
Thanks,
Bill
On Fri, 2012-09-21 at 11:15 -0500, Sergei Larin wrote:
> Hi Bill,
>
> Which scheduler do you use? MI or SDNode one? In either case the problem
> is likely the same, but cause might be in a different place...
>
> The way I see it, you have an issue with the alias analyzer, not scheduler.
> When scheduling DAG is constructed, AA is checked for pairs of mem accessing
> objects, and if no potential interference is flagged by the AA the chain
> edge is _not_ inserted. If that decision is wrong, you will end up with a
> well hidden and randomly popping bugs.
>
> So the question much more likely is: Why AA sees these two objects as not
> aliasing, and are they properly described and presented to it?
>
> Does ld/bitcast has proper memory operands? Any flags on them? Is
> underlying memory object making sense?
>
> You can look at getUnderlyingObjectForInstr and MIsNeedChainEdge in the MI
> scheduling framework to see what I mean.
>
> If you are still using SDNode scheduling framework - it has a very similar
> functionality in a slightly different code.
>
> Hope this helps.
>
> Sergei
>
> ---
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by
> The Linux Foundation
>
> > -----Original Message-----
> > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu]
> > On Behalf Of William J. Schmidt
> > Sent: Friday, September 21, 2012 9:07 AM
> > To: llvmdev at cs.uiuc.edu
> > Subject: Re: [LLVMdev] Scheduling question (memory dependency)
> >
> > Here's another data point that may be useful. [Scheduling experts,
> > please help! :) ]
> >
> > If the two-byte bitfield is replaced by a two-byte struct (replace
> > "short i:8" with "short i", etc.), the scheduler properly generates a
> > dependency between the store and the load. For this case, a GEP is
> > used instead of a bitcast:
> >
> > ------------------------------------------------------------------
> > define void @_Z5check3fooj(%struct.foo* nocapture byval %f, i32 %i)
> > noinline {
> > entry:
> > %i1 = getelementptr inbounds %struct.foo* %f, i64 0, i32 0
> > %0 = load i16* %i1, align 2, !tbaa !0
> > ------------------------------------------------------------------
> >
> > One notable difference is the "!tbaa !0" decoration on the load. I
> > don't know whether this helps or not. Later the lowered instructions
> > look like:
> >
> > ------------------------------------------------------------------
> > 16B %vreg2<def> = COPY %X4; G8RC_with_sub_32:%vreg2
> > 32B %vreg1<def> = COPY %X3; G8RC:%vreg1
> > 48B STH8 %vreg1<kill>, 0, <fi#-1>; mem:ST2[FixedStack-1]
> > G8RC:%vreg1
> > 64B %vreg0<def> = LHZ 0, <fi#-1>; mem:LD2[%i11] GPRC:%vreg0
> > ...
> > ------------------------------------------------------------------
> >
> > Note the %i11 instead of %0 on the LHZ as another difference. The
> > scheduler then generates a dependency between the store and the load,
> > and everything works properly.
> >
> > Does this help tickle any memories?
> >
> > Thanks,
> > Bill
> >
> >
> > On Thu, 2012-09-20 at 18:02 -0500, William J. Schmidt wrote:
> > > Greetings,
> > >
> > > I'm investigating a bug in the PowerPC back end in which a load from
> > a
> > > storage address is being reordered prior to a store to the same
> > > storage address. I'm quite new to LLVM, so I would appreciate some
> > > help understanding what I'm seeing from the dumps. I assume that
> > some
> > > information is missing that would represent the memory dependency,
> > but
> > > I don't know what form that should take.
> > >
> > > Example source code is as follows:
> > >
> > > ----------------------------------------------------------------
> > > extern "C" { int printf(const char *, ...); void exit(int);} struct
> > > foo {
> > > short i:8;
> > > };
> > >
> > > void check(struct foo f, short i) __attribute__((noinline)) {
> > > if (f.i != i) {
> > > short fi = f.i;
> > > printf("problem with %u != %u\n", fi, i);
> > > exit(0);
> > > }
> > > }
> > > ---------------------------------------------------------------
> > >
> > > The initial portion of the Clang output is:
> > >
> > > define void @_Z5check3foos(%struct.foo* nocapture byval %f, i16
> > > signext %i) noinline {
> > > entry:
> > > %0 = bitcast %struct.foo* %f to i16*
> > > %1 = load i16* %0, align 2
> > > ...
> > > ---------------------------------------------------------------
> > >
> > > The code works OK at -O0. At -O1, the first part of the generated
> > > code
> > > is:
> > >
> > > ---------------------------------------------------------------
> > > .L._Z5check3foos:
> > > .cfi_startproc
> > > # BB#0: # %entry
> > > mflr 0
> > > std 0, 16(1)
> > > stdu 1, -112(1)
> > > .Ltmp1:
> > > .cfi_def_cfa_offset 112
> > > .Ltmp2:
> > > .cfi_offset lr, 16
> > > lha 5, 162(1)
> > > sth 3, 162(1)
> > > ...
> > > ---------------------------------------------------------------
> > >
> > > The problem here is that the incoming parameter in register 3 is
> > > stored too late, after an attempt to load the value into register 5.
> > >
> > > Looking at dumps with -debug, I see the following:
> > >
> > > ---------------------------------------------------------------
> > > ********** MACHINEINSTRS **********
> > > # Machine code for function _Z5check3foos: Post SSA Frame Objects:
> > > fi#-1: size=2, align=2, fixed, at location [SP+50] Function Live
> > > Ins: %X3 in %vreg1, %X4 in %vreg2
> > >
> > > 0B BB#0: derived from LLVM BB %entry
> > > Live Ins: %X3 %X4
> > > 16B %vreg2<def> = COPY %X4; G8RC_with_sub_32:%vreg2
> > > 32B %vreg1<def> = COPY %X3; G8RC:%vreg1
> > > 48B STH8 %vreg1<kill>, 0, <fi#-1>; mem:ST2[FixedStack-1]
> > G8RC:%vreg1
> > > 64B %vreg4<def> = LHA 0, <fi#-1>; mem:LD2[%0] GPRC:%vreg4
> > > ...
> > > ---------------------------------------------------------------
> > >
> > > So far, so good. When we get to list scheduling, not quite so good:
> > >
> > > ---------------------------------------------------------------
> > > ********** List Scheduling **********
> > > SU(0): STH8 %X3<kill>, 162, %X1; mem:ST2[FixedStack-1]
> > > # preds left : 0
> > > # succs left : 4
> > > # rdefs left : 0
> > > Latency : 3
> > > Depth : 0
> > > Height : 0
> > > Successors:
> > > antiSU(2): Latency=0
> > > antiSU(2): Latency=0
> > > ch SU(5): Latency=0
> > > ch SU(4294967295) *: Latency=0
> > >
> > > SU(1): %R5<def> = LHA 162, %X1; mem:LD2[%0]
> > > # preds left : 0
> > > # succs left : 3
> > > # rdefs left : 0
> > > Latency : 5
> > > Depth : 0
> > > Height : 0
> > > Successors:
> > > out SU(3): Latency=1
> > > val SU(2): Latency=5
> > > ch SU(5): Latency=0
> > > ...
> > > ---------------------------------------------------------------
> > >
> > > There is no dependency expressed between these two memory operations,
> > > although they both access the stack address 162(X1). The scheduler
> > > then sees both instructions as ready, and chooses the load based on
> > > critical path height:
> > >
> > > ---------------------------------------------------------------
> > > *** Examining Available
> > > Height 9: SU(1): %R5<def> = LHA 162, %X1; mem:LD2[%0]
> > > Height 4: SU(0): STH8 %X3<kill>, 162, %X1; mem:ST2[FixedStack-1]
> > > *** Scheduling [0]: SU(1): %R5<def> = LHA 162, %X1; mem:LD2[%0]
> > > ---------------------------------------------------------------
> > >
> > > The obvious questions are: Why is there no dependence between these
> > > two instructions? And what needs to be done to ensure there is one?
> > > My guess is that we somehow need to unify FixedStack-1 with %0, but
> > > it's not clear to me how this would be accomplished.
> > >
> > > (The store is generated as part of SelectionDAGISel::LowerArguments
> > > from lib/CodeGen/SelectionDAG/SelectionDAGBuilder, using the
> > > PowerPC-specific code in lib/Target/PowerPC/PPCISelLowering.cpp. The
> > > load is generated directly from the "load" in the LLVM IR at some
> > > other time.)
> > >
> > > Thanks very much for any help!
> > >
> > > Bill
> > >
> >
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
More information about the llvm-dev
mailing list