[PATCH, PowerPC] ABI fixes / improvements for powerpc64-linux

Tue Jul 8 13:01:10 PDT 2014

----- Original Message -----
> From: "Ulrich Weigand" <Ulrich.Weigand at de.ibm.com>
> To: "cfe commits" <cfe-commits at cs.uiuc.edu>
> Cc: "Hal Finkel" <hfinkel at anl.gov>
> Sent: Tuesday, July 8, 2014 12:11:16 PM
> Subject: [PATCH, PowerPC] ABI fixes / improvements for powerpc64-linux
> 
> 
> 
> Hello,
> 
> I've been working to verify ABI compatibility between GCC and LLVM on
> powerpc64-linux using the GCC ABI compatibility test suite.  While
> for the
> most part, the compilers are of course compatible in the "usual"
> scenarios,
> this testing uncovered a number of bugs in corner-case situations.  [
> Some
> of these are actually GCC bugs which I'll be addressing on that side.
>  ]

Great! Thanks for working on this.

> 
> In LLVM, the main issues I found are fixed by the patches attached
> here.
> There are three issues (which are somewhat independent, but the
> patches
> build on top of each other so I'll send them all in one mail):
> 
> - "synthetic" (non-Altivec) vector types
> 
> LLVM currently does not correctly implement the ABI for passing
> non-Altivec
> vector types (defined via attribute((vector_size))).  While there is
> no
> "official" ABI for these in any case, it still would make sense to
> implement the ad-hoc ABI as implemented by GCC for those types.  This
> is
> done by the following patch:
> (See attached file: diff-clang-synthvector)

+// Return true if Ty represents an Altivec (or VSX) vector type.
+bool
+PPC64_SVR4_ABIInfo::isAltivecVectorType(const Type *Ty) const {
+  if (const VectorType *VT = Ty->getAs<VectorType>()) {
+    // We accept all vector types that are 16 bytes in size
+    // and have a power-of-two number of elements.
+    unsigned NumElements = VT->getNumElements();
+    uint64_t Size = getContext().getTypeSize(VT);
+
+    if ((NumElements & (NumElements - 1)) == 0 && Size == 128)
+      return true;
+  }
+
+  return false;
+}

Why are we restricting to power-of-two element numbers? <3 x float> will become a <4 x float> and end up in vector registers anyhow.

@@ -2951,6 +2972,22 @@ PPC64_SVR4_ABIInfo::classifyArgumentType
   if (Ty->isAnyComplexType())
     return ABIArgInfo::getDirect();
 
+  // Non-Altivec vector types are passed in GPRs (up to 16 bytes)
+  // or via reference (larger than 16 bytes).
+  if (Ty->isVectorType() && !isAltivecVectorType(Ty)) {
+    uint64_t Size = getContext().getTypeSize(Ty);
+    if (Size > 128)
+      return ABIArgInfo::getIndirect(0, /*ByVal=*/false);
+    else if (Size > 64) {
+      llvm::Type *CoerceTy = llvm::IntegerType::get(getVMContext(), 64);
+      CoerceTy = llvm::StructType::get(CoerceTy, CoerceTy, NULL);
+      return ABIArgInfo::getDirect(CoerceTy);
+    } else {
+      llvm::Type *CoerceTy = llvm::IntegerType::get(getVMContext(), Size);
+      return ABIArgInfo::getDirect(CoerceTy);
+    }
+  }
+
   if (isAggregateTypeForABI(Ty)) {
     if (CGCXXABI::RecordArgABI RAA = getRecordArgABI(Ty, getCXXABI()))
       return ABIArgInfo::getIndirect(0, RAA == CGCXXABI::RAA_DirectInMemory);
@@ -2970,6 +3007,22 @@ PPC64_SVR4_ABIInfo::classifyReturnType(Q
   if (RetTy->isAnyComplexType())
     return ABIArgInfo::getDirect();
 
+  // Non-Altivec vector types are returned in GPRs (up to 16 bytes)
+  // or via reference (larger than 16 bytes).
+  if (RetTy->isVectorType() && !isAltivecVectorType(RetTy)) {
+    uint64_t Size = getContext().getTypeSize(RetTy);
+    if (Size > 128)
+      return ABIArgInfo::getIndirect(0);
+    else if (Size > 64) {
+      llvm::Type *CoerceTy = llvm::IntegerType::get(getVMContext(), 64);
+      CoerceTy = llvm::StructType::get(CoerceTy, CoerceTy, NULL);
+      return ABIArgInfo::getDirect(CoerceTy);
+    } else {
+      llvm::Type *CoerceTy = llvm::IntegerType::get(getVMContext(), Size);
+      return ABIArgInfo::getDirect(CoerceTy);
+    }
+  }
+
   if (isAggregateTypeForABI(RetTy))
     return ABIArgInfo::getIndirect(0);

These two hunks are very similar, can they be refactored into a single function?

Please add some tests for non-power-of-two-number-of-elements types.

> 
> - Alignment of arguments in the parameter save area
> 
> Certain arguments are supposed to be aligned to a 16-byte boundary in
> the
> argument save area.  This affects vector types as well as aggregates
> that
> have a 16-byte (or higher) alignment requirement.  LLVM currently
> implements this alignment for vector types, but not for aggregates,
> and it
> does not respect the alignment at all (even for vector types) when
> accessing a variable argument list via va_arg.
> 
> Aggregates are passed using the "byval" mechanism, so in theory
> either
> Clang or LLVM could compute the correct alignment required for an
> aggregate
> parameter.  However, it seems preferable to do this in Clang, since
> it has
> the more complete type information, and Clang needs the information
> anyway
> in order to expand va_arg correctly.  So this is what the following
> patch
> does.  As a result, every byval parameter will now carry an explicit
> "align" attribute in the IR created by Clang (which I understand is
> recommended anyway?).

Yes.

LGTM. Please add some tests for non-power-of-two-number-of-elements types.

> 
> (See attached file: diff-clang-structalign)
> 
> - Avoiding "byval" for aggregates passed in registers
> 
> This is not actually a bug fix, but more of a code-gen enhancement.
> (However, it will become a required feature for the ELFv2
> little-endian
> ABI, which I hope to post patches for shortly.)
> 
> Aggregates are currently always passed using the "byval" method.
>  However,
> the first 64 bytes will end up in registers anyway, so it would
> improve
> code generation if the IR made use of registers explicit (instead of
> using
> byval).  The following patch implements this.  The main issue here is
> that
> we need to track how many registers will end up being used by the
> previous
> arguments.  However, due to the linear layout of the parameter area
> in the
> ppc64 ABI, this is reasonably straight-forward (and there is
> precedent of
> many other architectures likewise tracking register usage in Clang).
> 
> One extra trick is that aggregates that need 16-byte alignment now
> have to
> get an explicit padding type to skip a register if needed.  For this
> to
> work we need to both track registers used by previous arguments, and
> need
> to compute alignment in Clang as introduced in the prior patch (so
> that's
> another reason to do it in the front-end).
> 
> (See attached file: diff-clang-structpass)
> 

+    // We may need to skip a slot to ensure alignment.
+    if (isAlignedParamType(Ty) && (AllocatedSlots & 1))
+      AllocatedSlots++;

How do you know it will be only one? What happens if the type has an enhanced alignment (given by __attribute__((aligned(128))) for example).

@@ -3029,6 +3065,32 @@ PPC64_SVR4_ABIInfo::classifyArgumentType
       return ABIArgInfo::getIndirect(0, RAA == CGCXXABI::RAA_DirectInMemory);
 
     uint64_t ABIAlign = isAlignedParamType(Ty)? 16 : 8;
+    bool SkipSlot = (ABIAlign > 8 && (AllocatedSlots & 1));

Same question here.

In short, I'd like some tests with various artificial alignments specified to make sure that everything still works. ;)

Thanks again,
Hal

> 
> 
> Mit freundlichen Gruessen / Best Regards
> 
> Ulrich Weigand
> 
> --
>   Dr. Ulrich Weigand | Phone: +49-7031/16-3727
>   STSM, GNU/Linux compilers and toolchain
>   IBM Deutschland Research & Development GmbH
>   Vorsitzende des Aufsichtsrats: Martina Koederitz |
>   Geschäftsführung: Dirk
> Wittkopp
>   Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
> Stuttgart, HRB 243294

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory