[PATCH, PowerPC] ABI fixes / improvements for powerpc64-linux
Hal Finkel
hfinkel at anl.gov
Tue Jul 8 13:01:10 PDT 2014
----- Original Message -----
> From: "Ulrich Weigand" <Ulrich.Weigand at de.ibm.com>
> To: "cfe commits" <cfe-commits at cs.uiuc.edu>
> Cc: "Hal Finkel" <hfinkel at anl.gov>
> Sent: Tuesday, July 8, 2014 12:11:16 PM
> Subject: [PATCH, PowerPC] ABI fixes / improvements for powerpc64-linux
>
>
>
> Hello,
>
> I've been working to verify ABI compatibility between GCC and LLVM on
> powerpc64-linux using the GCC ABI compatibility test suite. While
> for the
> most part, the compilers are of course compatible in the "usual"
> scenarios,
> this testing uncovered a number of bugs in corner-case situations. [
> Some
> of these are actually GCC bugs which I'll be addressing on that side.
> ]
Great! Thanks for working on this.
>
> In LLVM, the main issues I found are fixed by the patches attached
> here.
> There are three issues (which are somewhat independent, but the
> patches
> build on top of each other so I'll send them all in one mail):
>
> - "synthetic" (non-Altivec) vector types
>
> LLVM currently does not correctly implement the ABI for passing
> non-Altivec
> vector types (defined via attribute((vector_size))). While there is
> no
> "official" ABI for these in any case, it still would make sense to
> implement the ad-hoc ABI as implemented by GCC for those types. This
> is
> done by the following patch:
> (See attached file: diff-clang-synthvector)
+// Return true if Ty represents an Altivec (or VSX) vector type.
+bool
+PPC64_SVR4_ABIInfo::isAltivecVectorType(const Type *Ty) const {
+ if (const VectorType *VT = Ty->getAs<VectorType>()) {
+ // We accept all vector types that are 16 bytes in size
+ // and have a power-of-two number of elements.
+ unsigned NumElements = VT->getNumElements();
+ uint64_t Size = getContext().getTypeSize(VT);
+
+ if ((NumElements & (NumElements - 1)) == 0 && Size == 128)
+ return true;
+ }
+
+ return false;
+}
Why are we restricting to power-of-two element numbers? <3 x float> will become a <4 x float> and end up in vector registers anyhow.
@@ -2951,6 +2972,22 @@ PPC64_SVR4_ABIInfo::classifyArgumentType
if (Ty->isAnyComplexType())
return ABIArgInfo::getDirect();
+ // Non-Altivec vector types are passed in GPRs (up to 16 bytes)
+ // or via reference (larger than 16 bytes).
+ if (Ty->isVectorType() && !isAltivecVectorType(Ty)) {
+ uint64_t Size = getContext().getTypeSize(Ty);
+ if (Size > 128)
+ return ABIArgInfo::getIndirect(0, /*ByVal=*/false);
+ else if (Size > 64) {
+ llvm::Type *CoerceTy = llvm::IntegerType::get(getVMContext(), 64);
+ CoerceTy = llvm::StructType::get(CoerceTy, CoerceTy, NULL);
+ return ABIArgInfo::getDirect(CoerceTy);
+ } else {
+ llvm::Type *CoerceTy = llvm::IntegerType::get(getVMContext(), Size);
+ return ABIArgInfo::getDirect(CoerceTy);
+ }
+ }
+
if (isAggregateTypeForABI(Ty)) {
if (CGCXXABI::RecordArgABI RAA = getRecordArgABI(Ty, getCXXABI()))
return ABIArgInfo::getIndirect(0, RAA == CGCXXABI::RAA_DirectInMemory);
@@ -2970,6 +3007,22 @@ PPC64_SVR4_ABIInfo::classifyReturnType(Q
if (RetTy->isAnyComplexType())
return ABIArgInfo::getDirect();
+ // Non-Altivec vector types are returned in GPRs (up to 16 bytes)
+ // or via reference (larger than 16 bytes).
+ if (RetTy->isVectorType() && !isAltivecVectorType(RetTy)) {
+ uint64_t Size = getContext().getTypeSize(RetTy);
+ if (Size > 128)
+ return ABIArgInfo::getIndirect(0);
+ else if (Size > 64) {
+ llvm::Type *CoerceTy = llvm::IntegerType::get(getVMContext(), 64);
+ CoerceTy = llvm::StructType::get(CoerceTy, CoerceTy, NULL);
+ return ABIArgInfo::getDirect(CoerceTy);
+ } else {
+ llvm::Type *CoerceTy = llvm::IntegerType::get(getVMContext(), Size);
+ return ABIArgInfo::getDirect(CoerceTy);
+ }
+ }
+
if (isAggregateTypeForABI(RetTy))
return ABIArgInfo::getIndirect(0);
These two hunks are very similar, can they be refactored into a single function?
Please add some tests for non-power-of-two-number-of-elements types.
>
> - Alignment of arguments in the parameter save area
>
> Certain arguments are supposed to be aligned to a 16-byte boundary in
> the
> argument save area. This affects vector types as well as aggregates
> that
> have a 16-byte (or higher) alignment requirement. LLVM currently
> implements this alignment for vector types, but not for aggregates,
> and it
> does not respect the alignment at all (even for vector types) when
> accessing a variable argument list via va_arg.
>
> Aggregates are passed using the "byval" mechanism, so in theory
> either
> Clang or LLVM could compute the correct alignment required for an
> aggregate
> parameter. However, it seems preferable to do this in Clang, since
> it has
> the more complete type information, and Clang needs the information
> anyway
> in order to expand va_arg correctly. So this is what the following
> patch
> does. As a result, every byval parameter will now carry an explicit
> "align" attribute in the IR created by Clang (which I understand is
> recommended anyway?).
Yes.
LGTM. Please add some tests for non-power-of-two-number-of-elements types.
>
> (See attached file: diff-clang-structalign)
>
> - Avoiding "byval" for aggregates passed in registers
>
> This is not actually a bug fix, but more of a code-gen enhancement.
> (However, it will become a required feature for the ELFv2
> little-endian
> ABI, which I hope to post patches for shortly.)
>
> Aggregates are currently always passed using the "byval" method.
> However,
> the first 64 bytes will end up in registers anyway, so it would
> improve
> code generation if the IR made use of registers explicit (instead of
> using
> byval). The following patch implements this. The main issue here is
> that
> we need to track how many registers will end up being used by the
> previous
> arguments. However, due to the linear layout of the parameter area
> in the
> ppc64 ABI, this is reasonably straight-forward (and there is
> precedent of
> many other architectures likewise tracking register usage in Clang).
>
> One extra trick is that aggregates that need 16-byte alignment now
> have to
> get an explicit padding type to skip a register if needed. For this
> to
> work we need to both track registers used by previous arguments, and
> need
> to compute alignment in Clang as introduced in the prior patch (so
> that's
> another reason to do it in the front-end).
>
> (See attached file: diff-clang-structpass)
>
+ // We may need to skip a slot to ensure alignment.
+ if (isAlignedParamType(Ty) && (AllocatedSlots & 1))
+ AllocatedSlots++;
How do you know it will be only one? What happens if the type has an enhanced alignment (given by __attribute__((aligned(128))) for example).
@@ -3029,6 +3065,32 @@ PPC64_SVR4_ABIInfo::classifyArgumentType
return ABIArgInfo::getIndirect(0, RAA == CGCXXABI::RAA_DirectInMemory);
uint64_t ABIAlign = isAlignedParamType(Ty)? 16 : 8;
+ bool SkipSlot = (ABIAlign > 8 && (AllocatedSlots & 1));
Same question here.
In short, I'd like some tests with various artificial alignments specified to make sure that everything still works. ;)
Thanks again,
Hal
>
>
> Mit freundlichen Gruessen / Best Regards
>
> Ulrich Weigand
>
> --
> Dr. Ulrich Weigand | Phone: +49-7031/16-3727
> STSM, GNU/Linux compilers and toolchain
> IBM Deutschland Research & Development GmbH
> Vorsitzende des Aufsichtsrats: Martina Koederitz |
> Geschäftsführung: Dirk
> Wittkopp
> Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
> Stuttgart, HRB 243294
--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
More information about the cfe-commits
mailing list