[llvm-commits] [RFC/PATCH, PowerPC] Fix oggenc miscompile - invalid CSE of DYNALLOC nodes

Rafael EspĂ­ndola rafael.espindola at gmail.com
Thu Oct 11 08:53:23 PDT 2012


Can you include the testcase in the patch?

On 10 October 2012 13:18, Ulrich Weigand <Ulrich.Weigand at de.ibm.com> wrote:
>
> Hello,
>
> I've started to work on LLVM to help improve the PowerPC back-end,
> and started out by looking at the currently failing test cases.
>
> The MultiSource/Applications/oggenc/oggenc test is one of those;
> building the oggenc binary with clang at any optimization
> level greater than -O0 results in corrupted .ogg output files.
> (When built with GCC, or with clang at -O0, oggenc works fine.)
>
> Debugging the problem showed the root cause to be a miscompilation
> of the bark_noise_hybridmp function.  The routine allocates a couple
> of temporary arrays of the same size on the stack:
>
> static void bark_noise_hybridmp(int n,const long *b,
>                                 const float *f,
>                                 float *noise,
>                                 const float offset,
>                                 const int fixed){
>
>   float *N=alloca(n*sizeof(*N));
>   float *X=alloca(n*sizeof(*N));
>   float *XX=alloca(n*sizeof(*N));
>   float *Y=alloca(n*sizeof(*N));
>   float *XY=alloca(n*sizeof(*N));
>
> However, looking at the generated code, we see only a single array
> being allocated, and a pointer to that same array used for *all*
> of those local variables N, X, XX, Y, and XY.
>
> A reduced test case showing the problem is:
>
> target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v128:128:128-n32:64"
> target triple = "powerpc64-unknown-linux-gnu"
>
> define void @test(i64 %n) nounwind {
> entry:
>   %0 = alloca i8, i64 %n, align 1
>   %1 = alloca i8, i64 %n, align 1
>   call void @use(i8* %0, i8* %1) nounwind
>   ret void
> }
>
> declare void @use(i8*, i8*)
>
>
> Looking at the intermediate compilation steps, it seems that
> the problem occurs during the MachineCSE pass.  Before, we have:
>
> BB#0: derived from LLVM BB %entry
>     Live Ins: %X3
>         %vreg0<def> = COPY %X3; G8RC:%vreg0
>         %vreg1<def> = ADDI8 %vreg0, 15; G8RC:%vreg1,%vreg0
>         %vreg2<def> = LI8 65520; G8RC:%vreg2
>         %vreg3<def> = AND8 %vreg1<kill>, %vreg2<kill>; G8RC:%vreg3,%vreg1,%vreg2
>         %vreg4<def> = NEG8 %vreg3<kill>; G8RC:%vreg4,%vreg3
>         %vreg5<def> = DYNALLOC8 %vreg4, 0, <fi#-1>, %X1<imp-def,dead>, %X1<imp-use>; G8RC:%vreg5,%vreg4
>         %vreg6<def> = DYNALLOC8 %vreg4, 0, <fi#-1>, %X1<imp-def,dead>, %X1<imp-use>; G8RC:%vreg6,%vreg4
>         ADJCALLSTACKDOWN 112, %R1<imp-def,dead>, %R1<imp-use>
>         %X3<def> = COPY %vreg5; G8RC:%vreg5
>         %X4<def> = COPY %vreg6; G8RC:%vreg6
>         BL8_NOP_ELF <ga:@use>, <regmask>, %RM<imp-use>, %X3<imp-use>, %X4<imp-use>, %R1<imp-def>, ...
>         ADJCALLSTACKUP 112, 0, %R1<imp-def,dead>, %R1<imp-use>
>         BLR pred:20, pred:%noreg, %LR<imp-use>, %RM<imp-use>
>
> But after MachineCSE, we have the incorrect:
>
> BB#0: derived from LLVM BB %entry
>     Live Ins: %X3
>         %vreg0<def> = COPY %X3; G8RC:%vreg0
>         %vreg1<def> = ADDI8 %vreg0, 15; G8RC:%vreg1,%vreg0
>         %vreg2<def> = LI8 65520; G8RC:%vreg2
>         %vreg3<def> = AND8 %vreg1<kill>, %vreg2<kill>; G8RC:%vreg3,%vreg1,%vreg2
>         %vreg4<def> = NEG8 %vreg3<kill>; G8RC:%vreg4,%vreg3
>         %vreg5<def> = DYNALLOC8 %vreg4, 0, <fi#-1>, %X1<imp-def,dead>, %X1<imp-use>; G8RC:%vreg5,%vreg4
>         ADJCALLSTACKDOWN 112, %R1<imp-def,dead>, %R1<imp-use>
>         %X3<def> = COPY %vreg5; G8RC:%vreg5
>         %X4<def> = COPY %vreg5; G8RC:%vreg5
>         BL8_NOP_ELF <ga:@use>, <regmask>, %RM<imp-use>, %X3<imp-use>, %X4<imp-use>, %R1<imp-def>, ...
>         ADJCALLSTACKUP 112, 0, %R1<imp-def,dead>, %R1<imp-use>
>         BLR pred:20, pred:%noreg, %LR<imp-use>, %RM<imp-use>
>
> It would appear that CSE has concluded that two instances of DYNALLOC8
> with the same size argument must have the same result, which is of course
> not correct.
>
> I'm not sure I fully understand why this is the case.  The PowerPC back-end
> states that DYNALLOC8 implicitly defines and uses the stack pointer (%X1);
> I would have expected this to cause CSE to not merge them.  However, for
> some reason, the implicit definitions are considered "dead" and thus
> ignored for CSE purposes.  (Interestingly enough, when live analysis
> runs later on, it recognizes that those implicit definitions are in
> fact live and not dead.  But that's too late ...)
>
> I found that the place where those implicit definitions are marked dead is
> MachineInstr::setPhysRegsDeadExcept as called from InstrEmitter::EmitMachineNode,
> in the block of code headed by this comment:
>
>   // The MachineInstr may also define physregs instead of virtregs.  These
>   // physreg values can reach other instructions in different ways:
>   //
>   // 1. When there is a use of a Node value beyond the explicitly defined
>   //    virtual registers, we emit a CopyFromReg for one of the implicitly
>   //    defined physregs.  This only happens when HasPhysRegOuts is true.
>   //
>   // 2. A CopyFromReg reading a physreg may be glued to this instruction.
>   //
>   // 3. A glued instruction may implicitly use a physreg.
>   //
>   // 4. A glued instruction may use a RegisterSDNode operand.
>
> So I guess because the next use of the stack pointer cannot be determined
> by following SelectionDAG edges, it is considered dead?
>
>
> Looking some more, I found a solution that in fact prevents MachineCSE from
> combining the DYNALLOC8 noded, but I'm not sure if this is indeed the
> "correct" fix or just a work-around:  When I mark the DYNALLOC8 nodes as
> "mayStore", they are not combined.  Since the code a DYNALLOC8 is expanded
> to indeed *will* contain a store (to re-establish the back-chain link after
> the stack pointer is decremented), this arguably ought to be done anyway.
> I'm still wondering whether other problems may be lurking with the way the
> back-end uses implit def/uses of the stack pointer in other instructions ...
>
> The attached patch implements this solution, which does in fact fix the
> oggenc testcase (and causes no test regressions).
>
> Would this be OK to commit?
>
> Any other suggestions how to fix the underlying problem?
>
> Thanks,
> Ulrich
>
> (See attached file: diff-llvm-dynalloc-maystore)
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>



More information about the llvm-commits mailing list