[llvm-commits] [RFC/PATCH, PowerPC] Fix oggenc miscompile - invalid CSE of DYNALLOC nodes

Ulrich Weigand Ulrich.Weigand at de.ibm.com
Wed Oct 10 10:18:48 PDT 2012


Hello,

I've started to work on LLVM to help improve the PowerPC back-end,
and started out by looking at the currently failing test cases.

The MultiSource/Applications/oggenc/oggenc test is one of those;
building the oggenc binary with clang at any optimization
level greater than -O0 results in corrupted .ogg output files.
(When built with GCC, or with clang at -O0, oggenc works fine.)

Debugging the problem showed the root cause to be a miscompilation
of the bark_noise_hybridmp function.  The routine allocates a couple
of temporary arrays of the same size on the stack:

static void bark_noise_hybridmp(int n,const long *b,
                                const float *f,
                                float *noise,
                                const float offset,
                                const int fixed){

  float *N=alloca(n*sizeof(*N));
  float *X=alloca(n*sizeof(*N));
  float *XX=alloca(n*sizeof(*N));
  float *Y=alloca(n*sizeof(*N));
  float *XY=alloca(n*sizeof(*N));

However, looking at the generated code, we see only a single array
being allocated, and a pointer to that same array used for *all*
of those local variables N, X, XX, Y, and XY.

A reduced test case showing the problem is:

target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v128:128:128-n32:64"
target triple = "powerpc64-unknown-linux-gnu"

define void @test(i64 %n) nounwind {
entry:
  %0 = alloca i8, i64 %n, align 1
  %1 = alloca i8, i64 %n, align 1
  call void @use(i8* %0, i8* %1) nounwind
  ret void
}

declare void @use(i8*, i8*)


Looking at the intermediate compilation steps, it seems that
the problem occurs during the MachineCSE pass.  Before, we have:

BB#0: derived from LLVM BB %entry
    Live Ins: %X3
	%vreg0<def> = COPY %X3; G8RC:%vreg0
	%vreg1<def> = ADDI8 %vreg0, 15; G8RC:%vreg1,%vreg0
	%vreg2<def> = LI8 65520; G8RC:%vreg2
	%vreg3<def> = AND8 %vreg1<kill>, %vreg2<kill>; G8RC:%vreg3,%vreg1,%vreg2
	%vreg4<def> = NEG8 %vreg3<kill>; G8RC:%vreg4,%vreg3
	%vreg5<def> = DYNALLOC8 %vreg4, 0, <fi#-1>, %X1<imp-def,dead>, %X1<imp-use>; G8RC:%vreg5,%vreg4
	%vreg6<def> = DYNALLOC8 %vreg4, 0, <fi#-1>, %X1<imp-def,dead>, %X1<imp-use>; G8RC:%vreg6,%vreg4
	ADJCALLSTACKDOWN 112, %R1<imp-def,dead>, %R1<imp-use>
	%X3<def> = COPY %vreg5; G8RC:%vreg5
	%X4<def> = COPY %vreg6; G8RC:%vreg6
	BL8_NOP_ELF <ga:@use>, <regmask>, %RM<imp-use>, %X3<imp-use>, %X4<imp-use>, %R1<imp-def>, ...
	ADJCALLSTACKUP 112, 0, %R1<imp-def,dead>, %R1<imp-use>
	BLR pred:20, pred:%noreg, %LR<imp-use>, %RM<imp-use>

But after MachineCSE, we have the incorrect:

BB#0: derived from LLVM BB %entry
    Live Ins: %X3
	%vreg0<def> = COPY %X3; G8RC:%vreg0
	%vreg1<def> = ADDI8 %vreg0, 15; G8RC:%vreg1,%vreg0
	%vreg2<def> = LI8 65520; G8RC:%vreg2
	%vreg3<def> = AND8 %vreg1<kill>, %vreg2<kill>; G8RC:%vreg3,%vreg1,%vreg2
	%vreg4<def> = NEG8 %vreg3<kill>; G8RC:%vreg4,%vreg3
	%vreg5<def> = DYNALLOC8 %vreg4, 0, <fi#-1>, %X1<imp-def,dead>, %X1<imp-use>; G8RC:%vreg5,%vreg4
	ADJCALLSTACKDOWN 112, %R1<imp-def,dead>, %R1<imp-use>
	%X3<def> = COPY %vreg5; G8RC:%vreg5
	%X4<def> = COPY %vreg5; G8RC:%vreg5
	BL8_NOP_ELF <ga:@use>, <regmask>, %RM<imp-use>, %X3<imp-use>, %X4<imp-use>, %R1<imp-def>, ...
	ADJCALLSTACKUP 112, 0, %R1<imp-def,dead>, %R1<imp-use>
	BLR pred:20, pred:%noreg, %LR<imp-use>, %RM<imp-use>

It would appear that CSE has concluded that two instances of DYNALLOC8
with the same size argument must have the same result, which is of course
not correct.

I'm not sure I fully understand why this is the case.  The PowerPC back-end
states that DYNALLOC8 implicitly defines and uses the stack pointer (%X1);
I would have expected this to cause CSE to not merge them.  However, for
some reason, the implicit definitions are considered "dead" and thus
ignored for CSE purposes.  (Interestingly enough, when live analysis
runs later on, it recognizes that those implicit definitions are in
fact live and not dead.  But that's too late ...)

I found that the place where those implicit definitions are marked dead is
MachineInstr::setPhysRegsDeadExcept as called from InstrEmitter::EmitMachineNode,
in the block of code headed by this comment:

  // The MachineInstr may also define physregs instead of virtregs.  These
  // physreg values can reach other instructions in different ways:
  //
  // 1. When there is a use of a Node value beyond the explicitly defined
  //    virtual registers, we emit a CopyFromReg for one of the implicitly
  //    defined physregs.  This only happens when HasPhysRegOuts is true.
  //
  // 2. A CopyFromReg reading a physreg may be glued to this instruction.
  //
  // 3. A glued instruction may implicitly use a physreg.
  //
  // 4. A glued instruction may use a RegisterSDNode operand.

So I guess because the next use of the stack pointer cannot be determined
by following SelectionDAG edges, it is considered dead?


Looking some more, I found a solution that in fact prevents MachineCSE from
combining the DYNALLOC8 noded, but I'm not sure if this is indeed the
"correct" fix or just a work-around:  When I mark the DYNALLOC8 nodes as
"mayStore", they are not combined.  Since the code a DYNALLOC8 is expanded
to indeed *will* contain a store (to re-establish the back-chain link after
the stack pointer is decremented), this arguably ought to be done anyway.
I'm still wondering whether other problems may be lurking with the way the
back-end uses implit def/uses of the stack pointer in other instructions ...

The attached patch implements this solution, which does in fact fix the
oggenc testcase (and causes no test regressions).

Would this be OK to commit?

Any other suggestions how to fix the underlying problem?

Thanks,
Ulrich

(See attached file: diff-llvm-dynalloc-maystore)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: diff-llvm-dynalloc-maystore
Type: application/octet-stream
Size: 674 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20121010/9bc1f81a/attachment.obj>


More information about the llvm-commits mailing list