[PATCH, PowerPC] ABI fixes / improvements for powerpc64-linux

Wed Jul 9 15:08:57 PDT 2014

----- Original Message -----
> From: "Ulrich Weigand" <Ulrich.Weigand at de.ibm.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "cfe commits" <cfe-commits at cs.uiuc.edu>
> Sent: Wednesday, July 9, 2014 4:13:28 PM
> Subject: Re: [PATCH, PowerPC] ABI fixes / improvements for powerpc64-linux
> 
> Hal Finkel <hfinkel at anl.gov> wrote on 09.07.2014 22:37:57:
> 
> > Unfortunately, this logic is flawed. When a user specifies an
> > enhanced alignment for a structure, it is normally for performance
> > reasons (often so that it will start on a cache-line boundary).
> > This
> > difference is potentially observable (in practice) regardless of
> > whether the address is taken.
> 
> Well, those cases where performance count rarely involve structs
> passed by value :-)

Having structures where over-alignment is specified is rare, but within that class I disagree, but see below...

> 
> > Currently, in LLVM, when a local alloca (and this should include
> > byval parameters) require enhanced alignment, this triggers a
> > dynamic stack realignment (in the caller and/or callee). The offset
> > in the parameter save area is then also adjusted to ensure the
> > proper alignment (*)
> >
> > So, yes, this requires some additional pointer arithmetic to work
> > correctly. However, that's already implemented, and I believe this
> > currently works correctly. Don't break it (and if the new, yet
> > unpublished, ABI will require breaking it, please change the ABI).
> 
> It is not that the new ABI requires anything different, we simply
> brought the written ABI in line with what compilers actually did
> even with the old ABI.  No compiler ever aligned a parameter in
> the save area to more than 16 bytes; changing this at this point
> doesn't seem feasible.

For one thing, the new ABI is new, and changing a case like this is perfectly feasible. Even for the old ABI, this is clearly a bug, and can likely be fixed (in the past, this affected only a gcc language extension, and the amount of affected code is likely small).

Secondly, realigning the parameter save area *is* LLVM's current intended behavior; what should happen is:

 - If a function calls a function with an over-aligned byval argument, that should trigger (MFI->getMaxAlignment() > MF.getTarget().getFrameLowering()->getStackAlignment()) to be true in PPCRegisterInfo::needsStackRealignment.
 - Of source, this probably means that the function already had an over-aligned local variable (although this is not certain, it could be passing a global or some pointee), and already needed stack realignment.
 - The overaligned byval should force the parameter save area to be overaligned (by padding in the local variable space)
 - The overaligned byval should be placed at an appropriately aligned offset within the parameter save area (PPCISelLowering already does this).

Also, with C11's _Alignas and C++11's alignas specifier, how we deal with over-aligned types moves from something that affects only a language extension, into something that touches a standard part of the C/C++ languages. Quality is important here, especially in the ABI, because "improving" the ABI later is obviously difficult. Forcing a double-copy of all over-aligned structures at ABI boundaries is not something to aspire to up front ;)

Seriously, however, the double-copy problem is a real problem. If a user puts alignas(128) on some structure/class to keep them all on separate cache lines, this is being done for performance. In C++, it is perfectly reasonable for these to be put in a container, where they might be passed by value to the container manipulation functions, for example. Forcing a double-copy, or other performance degradation, because of the overalignment would really be quite unfortunate. Now, I agree that these will normally be passed by const& instead of by value if the structure is actually large, but if the size of the structure is small, passing by value is reasonable (and should have the desired effect, no two will be on the same cache line (either because they're at different aligned offsets or because some are in registers)).

> 
> For example:
> 
> struct test { long x; } __attribute__((aligned (128)));
> 
> long callee (long x, struct test y)
> {
>   return y.x;
> }
> 
> This expects y.x to be in r5 with GCC, and with LLVM with my patch;
> LLVM without the patch currently expects it in r4.

As I noted in my last e-mail, the current LLVM implementation does not look like it skips registers when it adjusts the alignment offset. I'm happy to see a compatibility fix for that (although I'm unhappy that the current ABI requires it), but that's a separate issue.

Thanks again,
Hal

> 
> ELFv2 and ELFv1 are not intended to differ w.r.t. alignment in the
> parameter area here.
> 
> 
> Bye,
> Ulrich
> 
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory