[PATCH, PowerPC] ABI fixes / improvements for powerpc64-linux
Hal Finkel
hfinkel at anl.gov
Sat Jul 19 21:15:45 PDT 2014
----- Original Message -----
> From: "Ulrich Weigand" <Ulrich.Weigand at de.ibm.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "cfe commits" <cfe-commits at cs.uiuc.edu>
> Sent: Thursday, July 10, 2014 10:18:45 AM
> Subject: Re: [PATCH, PowerPC] ABI fixes / improvements for powerpc64-linux
>
> Hal Finkel <hfinkel at anl.gov> wrote on 10.07.2014 00:08:57:
>
> > Secondly, realigning the parameter save area *is* LLVM's current
> > intended behavior; what should happen is:
> >
> > - If a function calls a function with an over-aligned byval
> > argument, that should trigger (MFI->getMaxAlignment() >
> > MF.getTarget
> > ().getFrameLowering()->getStackAlignment()) to be true in
> > PPCRegisterInfo::needsStackRealignment.
> > - Of source, this probably means that the function already had an
> > over-aligned local variable (although this is not certain, it could
> > be passing a global or some pointee), and already needed stack
> realignment.
> > - The overaligned byval should force the parameter save area to be
> > overaligned (by padding in the local variable space)
> > - The overaligned byval should be placed at an appropriately
> > aligned offset within the parameter save area (PPCISelLowering
> > already does this).
>
> OK, thanks for the explanation. I certainly agree that LLVM ought
> to continue to support this at the IR level, if nothing else for
> the benefit of the JIT or variant-ABI frontends ...
>
> > Seriously, however, the double-copy problem is a real problem. If a
> > user puts alignas(128) on some structure/class to keep them all on
> > separate cache lines, this is being done for performance. In C++,
> > it
> > is perfectly reasonable for these to be put in a container, where
> > they might be passed by value to the container manipulation
> > functions, for example. Forcing a double-copy, or other performance
> > degradation, because of the overalignment would really be quite
> > unfortunate. Now, I agree that these will normally be passed by
> > const& instead of by value if the structure is actually large, but
> > if the size of the structure is small, passing by value is
> > reasonable (and should have the desired effect, no two will be on
> > the same cache line (either because they're at different aligned
> > offsets or because some are in registers)).
>
> I still don't quite see the case for overaligned byval parameters.
>
> Certainly, some use cases want to overalign structures to keep
> instances in separate cache lines. This is intended to prevent
> cache-line ping-pong when instances are accessed from different
> threads. However, that is unlikely to be an issue for by-value
> parameters; in fact it is *impossible* for another thread to
> access a byval parameter unless its address is taken.
>
> This brings us back to the one case I mentioned earlier, where we
> do indeed have to realign-by-copy byval parameters, namely when
> the address is taken. In this case, the ABI as defined does
> indeed have the drawback of requiring another copy. However,
> in defining an ABI you always have to balance pros and cons ...
> and there would also be disadvantages of requiring large
> alignments of byval parameters at the ABI level; starting with
> the fact that this requires large alignment of the stack pointer
> (which may not be easy to implement in all compilers), it will
> waste stack space if the argument doesn't have its address taken
> (reducing stack consumption is also often an important goal, also
> to reduce cache pressure), and it will waste GPRs (unless we break
> the 1:1 correspondence between GPRs and the first 8 stack slots --
> but that would then make va_list handling more complex).
>
> So even if we still had complete freedom in defining the ABI from
> scratch, it's not clear to me that requiring large byval alignment
> would be the best choice. As another data point, I'm not aware of
> any ABI on other platforms, even recently defined ones, that have
> that feature ...
A few thoughts:
- Over-aligned local have benefits even in single-threaded code (by avoiding load-after-store hazards, among other things). Moreover, this is really an interface penalty issue: If the user has a structure that normally wants enhanced alignment (because it is used in a concurrent data structure), but it sometimes is used elsewhere, there is no need to make that 'elsewhere' more expensive than necessary.
- Yes, stack realignment eats an extra register, and that can have a performance penalty. In my experience, however, this penalty is fairly small (as is the additional cache pressure from the larger stack) compared to whatever benefits the programmer had in mind (even if such benefits are not exactly realized in every routine).
- I would have preferred making va_list handling more complex. ;)
Thanks again,
Hal
>
> Bye,
> Ulrich
>
>
--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
More information about the cfe-commits
mailing list