[PATCH, PowerPC] ABI fixes / improvements for powerpc64-linux
hfinkel at anl.gov
Wed Jul 9 13:37:57 PDT 2014
----- Original Message -----
> From: "Ulrich Weigand" <Ulrich.Weigand at de.ibm.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "cfe commits" <cfe-commits at cs.uiuc.edu>
> Sent: Wednesday, July 9, 2014 2:54:20 PM
> Subject: Re: [PATCH, PowerPC] ABI fixes / improvements for powerpc64-linux
> Hal Finkel <hfinkel at anl.gov> wrote on 09.07.2014 21:19:56:
> > I don't understand how this works. If the type has a larger
> > alignment requirement, then you can either:
> > 1. Pass a pointer to it, or
> > 2. Make sure it is aligned in the parameter save area as
> > requested.
> > 3. Pass it underaligned, and then memcpy it to an aligned place in
> > the callee's stack frame.
> > When you have a greater-than-16-byte alignment specified, the
> > caller
> > and/or callee might need to do a dynamic stack realignment to make
> > things work, but that's already implemented. The existing ABI (
> > http://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi-1.9.html
> > ), as far as I can tell says nothing about aggregates being
> > restricted to 16-byte alignment.
> Yes, the ABI doc is wrong here. We'll fix this in the new version
> we're currently working on. What was intended (and what is
> implemented in GCC today) is somewhat of a mixture of 2 and 3
> above: if the type's alignment requirement is >= 16 byte, it
> will be passed in the save area at 16 byte alignment.
> The rationale is the this: alignment in the parameter save area
> matters for two reasons:
> - if you take the address of the parameter, it has to be properly
> aligned, since code is allowed to assume that property;
> - when you access a member of the parameter in place in the save
> area, the ISA may have alignment requirements; on PowerPC, the
> only such requirement is 16 byte for VMX loads/stores.
> The reason why an argument in the save area is 16-byte aligned
> is that this will suffice for the second reason above, i.e. it
> is possible to access the argument using VMX instructions.
> If the argument type actually requires > 16 byte alignment,
> and code indeed takes the address of the argument, then the
> compiler will have to copy the incoming argument to another
> place, properly aligned, on the stack.
Unfortunately, this logic is flawed. When a user specifies an enhanced alignment for a structure, it is normally for performance reasons (often so that it will start on a cache-line boundary). This difference is potentially observable (in practice) regardless of whether the address is taken.
Currently, in LLVM, when a local alloca (and this should include byval parameters) require enhanced alignment, this triggers a dynamic stack realignment (in the caller and/or callee). The offset in the parameter save area is then also adjusted to ensure the proper alignment (*)
So, yes, this requires some additional pointer arithmetic to work correctly. However, that's already implemented, and I believe this currently works correctly. Don't break it (and if the new, yet unpublished, ABI will require breaking it, please change the ABI).
(*) Looking at the current code in PPCISelLowering.cpp, I'm not sure that we correctly skip additional GPRs when we increase the offset because of alignment by more than 8 bytes. However, that's a separate issue (and, really, is a flaw in the old ABI, and that's yet another separate issue).
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
More information about the cfe-commits