[cfe-dev] [LLVMdev] Odd PPC inline asm constraint

Hal Finkel hfinkel at anl.gov
Sat May 19 14:35:45 PDT 2012


On Tue, 01 May 2012 21:25:29 -0500
Peter Bergner <bergner at vnet.ibm.com> wrote:

> On Tue, 2012-05-01 at 19:58 -0500, Peter Bergner wrote:
> > On Tue, 2012-05-01 at 17:47 -0500, Hal Finkel wrote:
> > >  By default it should build for
> > >  whatever the current host is (no special flags required). To
> > >  specifically build for something else, use:
> > >  -ccc-host-triple powerpc64-unknown-linux-gnu
> > >  or
> > >  -ccc-host-triple powerpc-unknown-linux-gnu
> > 
> > So LLVM isn't biarch capable?  Meaning one LLVM compiler cannot
> > generate both 32-bit and 64-bit binaries?
> 
> Sorry for replying to my own message, but...
> 
> Oh, -ccc-host-triple is a compiler option and not a configure option.
> That does work, though it seems I have to link with gcc, since llvm
> still wants to link against the 64-bit crt*.o and libs.  Maybe it is
> easier to just have two separate builds.
> 
> That said, my simple dynamically linked hello world executed fine
> (ie, it was able to call into libc.so just fine), as well as an
> old C version of the SPEC97 tomcatv benchmark I have laying around.
> So it seems both 32-bit and 64-bit can call into shared libs.
> 
> Not to say I haven't seen some code gen warts (using -O3). :)
> 
> From hello.s:
> 
>     main:
>         mflr 0
>         stw 31, -4(1)
>         stw 0, 4(1)
>         stwu 1, -16(1)
>         lis 3, .Lstr at ha
>         mr 31, 1
>         la 3, .Lstr at l(3)
>         bl puts
>         li 3, 0
>         addi 1, 1, 16
>         lwz 0, 4(1)
>         lwz 31, -4(1)
>         mtlr 0
>         blr 
> 
> By the strict letter of the 32-bit ABI, the save and restore of
> r31 at a negative offset of r1 is verboten.  The ABI states the
> the stack space below the stack pointer is declared as volatile.
> I actually debugged a similar problem way back in my Blue Gene/L
> days, where gcc had a bug and was doing the same thing.  We ended
> up taking a signal between the restore of the stack pointer and
> the restore of the nonvolatile reg and the BGL compute node kernel
> trashed the stack below the stack pointer.
> 
> The second wart is the dead copy to r31...which leads to the
> unnecessary save and restore of r31.
> 
> For tomcatv, we have to basically save/restore the entire set
> of non-volatile integer and fp registers.  Looking at how
> llvm does that shows:
> 
>         ...
>         lis 3, 56
>         ori 3, 3, 57680
>         stwx 16, 31, 3
>         lis 3, 56
>         ori 3, 3, 57684
>         stwx 17, 31, 3
>         lis 3, 56
>         ori 3, 3, 57688
>         stwx 18, 31, 3
>         lis 3, 56
>         ori 3, 3, 57692
>         stwx 19, 31, 3
>         lis 3, 56
>         ori 3, 3, 57696
>         stwx 20, 31, 3
>         lis 3, 56
>         ori 3, 3, 57700
>         stwx 21, 31, 3
>         [repeated over and over and ...]
> 
> Kind of ugly! :)  GCC on the other hand stashes away the old value of
> the stack pointer and then uses small negative offsets (legal at this
> point since we've already decremented the stack pointer) from that for
> all of its saves/restores:
> 
>         ...
>         lis 0,0xffc7
>         mr 12,1
>         ori 0,0,7728
>         stwux 1,1,0
>         mflr 0
>         stw 0,4(12)
>         stfd 14,-144(12)
>         stfd 15,-136(12)
>         stfd 16,-128(12)
>         stfd 17,-120(12)
>         stfd 18,-112(12)
>         ...

Peter,

There is a FIXME comment in the current code which reads:
> FIXME This disables some code that aligns the stack to a boundary
> bigger than the default (16 bytes on Darwin) when there is a stack
> local of greater alignment.  This does not currently work, because
> the delta between old and new stack pointers is added to offsets that
> reference incoming parameters after the prolog is generated, and the
> code that does that doesn't handle a variable delta.  You don't want
> to do that anyway; a better approach is to reserve another register
> that retains to the incoming stack pointer, and reference parameters
> relative to that.
> #define ALIGN_STACK 0

So given that this should also be fixed, presumably also by making an
extra copy of the stack pointer, should we always do this on
PPC32? Is there any difference for PPC64?

Thanks again,
Hal

> For things that don't work, do you have a small example program
> that shows what's wrong?
> 
> Peter
> 
> 
> 
> 



-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory



More information about the cfe-dev mailing list