[LLVMdev] Scheduling question (memory dependency)
William J. Schmidt
wschmidt at linux.vnet.ibm.com
Fri Sep 21 07:07:21 PDT 2012
Here's another data point that may be useful. [Scheduling experts,
please help! :) ]
If the two-byte bitfield is replaced by a two-byte struct (replace
"short i:8" with "short i", etc.), the scheduler properly generates a
dependency between the store and the load. For this case, a GEP is used
instead of a bitcast:
------------------------------------------------------------------
define void @_Z5check3fooj(%struct.foo* nocapture byval %f, i32 %i)
noinline {
entry:
%i1 = getelementptr inbounds %struct.foo* %f, i64 0, i32 0
%0 = load i16* %i1, align 2, !tbaa !0
------------------------------------------------------------------
One notable difference is the "!tbaa !0" decoration on the load. I
don't know whether this helps or not. Later the lowered instructions
look like:
------------------------------------------------------------------
16B %vreg2<def> = COPY %X4; G8RC_with_sub_32:%vreg2
32B %vreg1<def> = COPY %X3; G8RC:%vreg1
48B STH8 %vreg1<kill>, 0, <fi#-1>; mem:ST2[FixedStack-1] G8RC:%vreg1
64B %vreg0<def> = LHZ 0, <fi#-1>; mem:LD2[%i11] GPRC:%vreg0
...
------------------------------------------------------------------
Note the %i11 instead of %0 on the LHZ as another difference. The
scheduler then generates a dependency between the store and the load,
and everything works properly.
Does this help tickle any memories?
Thanks,
Bill
On Thu, 2012-09-20 at 18:02 -0500, William J. Schmidt wrote:
> Greetings,
>
> I'm investigating a bug in the PowerPC back end in which a load from a
> storage address is being reordered prior to a store to the same storage
> address. I'm quite new to LLVM, so I would appreciate some help
> understanding what I'm seeing from the dumps. I assume that some
> information is missing that would represent the memory dependency, but I
> don't know what form that should take.
>
> Example source code is as follows:
>
> ----------------------------------------------------------------
> extern "C" { int printf(const char *, ...); void exit(int);}
> struct foo {
> short i:8;
> };
>
> void check(struct foo f, short i) __attribute__((noinline)) {
> if (f.i != i) {
> short fi = f.i;
> printf("problem with %u != %u\n", fi, i);
> exit(0);
> }
> }
> ---------------------------------------------------------------
>
> The initial portion of the Clang output is:
>
> define void @_Z5check3foos(%struct.foo* nocapture byval %f, i16 signext %i) noinline {
> entry:
> %0 = bitcast %struct.foo* %f to i16*
> %1 = load i16* %0, align 2
> ...
> ---------------------------------------------------------------
>
> The code works OK at -O0. At -O1, the first part of the generated code
> is:
>
> ---------------------------------------------------------------
> .L._Z5check3foos:
> .cfi_startproc
> # BB#0: # %entry
> mflr 0
> std 0, 16(1)
> stdu 1, -112(1)
> .Ltmp1:
> .cfi_def_cfa_offset 112
> .Ltmp2:
> .cfi_offset lr, 16
> lha 5, 162(1)
> sth 3, 162(1)
> ...
> ---------------------------------------------------------------
>
> The problem here is that the incoming parameter in register 3 is stored
> too late, after an attempt to load the value into register 5.
>
> Looking at dumps with -debug, I see the following:
>
> ---------------------------------------------------------------
> ********** MACHINEINSTRS **********
> # Machine code for function _Z5check3foos: Post SSA
> Frame Objects:
> fi#-1: size=2, align=2, fixed, at location [SP+50]
> Function Live Ins: %X3 in %vreg1, %X4 in %vreg2
>
> 0B BB#0: derived from LLVM BB %entry
> Live Ins: %X3 %X4
> 16B %vreg2<def> = COPY %X4; G8RC_with_sub_32:%vreg2
> 32B %vreg1<def> = COPY %X3; G8RC:%vreg1
> 48B STH8 %vreg1<kill>, 0, <fi#-1>; mem:ST2[FixedStack-1] G8RC:%vreg1
> 64B %vreg4<def> = LHA 0, <fi#-1>; mem:LD2[%0] GPRC:%vreg4
> ...
> ---------------------------------------------------------------
>
> So far, so good. When we get to list scheduling, not quite so good:
>
> ---------------------------------------------------------------
> ********** List Scheduling **********
> SU(0): STH8 %X3<kill>, 162, %X1; mem:ST2[FixedStack-1]
> # preds left : 0
> # succs left : 4
> # rdefs left : 0
> Latency : 3
> Depth : 0
> Height : 0
> Successors:
> antiSU(2): Latency=0
> antiSU(2): Latency=0
> ch SU(5): Latency=0
> ch SU(4294967295) *: Latency=0
>
> SU(1): %R5<def> = LHA 162, %X1; mem:LD2[%0]
> # preds left : 0
> # succs left : 3
> # rdefs left : 0
> Latency : 5
> Depth : 0
> Height : 0
> Successors:
> out SU(3): Latency=1
> val SU(2): Latency=5
> ch SU(5): Latency=0
> ...
> ---------------------------------------------------------------
>
> There is no dependency expressed between these two memory operations,
> although they both access the stack address 162(X1). The scheduler then
> sees both instructions as ready, and chooses the load based on critical
> path height:
>
> ---------------------------------------------------------------
> *** Examining Available
> Height 9: SU(1): %R5<def> = LHA 162, %X1; mem:LD2[%0]
> Height 4: SU(0): STH8 %X3<kill>, 162, %X1; mem:ST2[FixedStack-1]
> *** Scheduling [0]: SU(1): %R5<def> = LHA 162, %X1; mem:LD2[%0]
> ---------------------------------------------------------------
>
> The obvious questions are: Why is there no dependence between these two
> instructions? And what needs to be done to ensure there is one? My
> guess is that we somehow need to unify FixedStack-1 with %0, but it's
> not clear to me how this would be accomplished.
>
> (The store is generated as part of SelectionDAGISel::LowerArguments from
> lib/CodeGen/SelectionDAG/SelectionDAGBuilder, using the PowerPC-specific
> code in lib/Target/PowerPC/PPCISelLowering.cpp. The load is generated
> directly from the "load" in the LLVM IR at some other time.)
>
> Thanks very much for any help!
>
> Bill
>
More information about the llvm-dev
mailing list