[PATCH] PowerPC support for the ELFv2 ABI (powerpc64le-linux)

Fri Jul 18 14:32:01 PDT 2014

----- Original Message -----
> From: "Ulrich Weigand" <Ulrich.Weigand at de.ibm.com>
> To: "cfe-commits" <cfe-commits at cs.uiuc.edu>
> Cc: "Hal Finkel" <hfinkel at anl.gov>
> Sent: Monday, July 14, 2014 10:17:35 AM
> Subject: [PATCH] PowerPC support for the ELFv2 ABI (powerpc64le-linux)
> 
> 
> 
> Hello,
> 
> this patch implements clang support for the PowerPC ELFv2 ABI.
>  Together
> with a series of companion patches in LLVM (posted at:
> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140714/225647.html
> ), this makes clang/LLVM fully usable on powerpc64le-linux.
> 
> Most of the ELFv2 ABI changes are fully implemented on the LLVM side,
> please read the description in the mail cited above.  On the clang
> side,
> however, we need to implement some changes in how aggregate types are
> passed by value.   Specifically, we need to:
> - pass (and return) "homogeneous" floating-point or vector aggregates
> in
> FPRs and VRs (this is similar to the ARM homogeneous aggregate ABI)
> - return aggregates of up to 16 bytes in one or two GPRs
> - pass aggregates that fit fully in registers without using the
> "byval"
> mechanism (see discussion in the LLVM mail)
> 
> The LLVM side implements the infrastructure required to do all this.
>  In
> particular, LLVM implements support for passing array types directly,
> where
> the array element type encodes properties needed to handle all those
> ELFv2
> features correctly. Specifically, the array element type encodes:
> - whether the parameter should be passed in FPRs, VRs, or just
> GPRs/stack
> slots  (for float / vector / integer element types, respectively)
> - what the alignment requirements of the parameter are when passed in
> GPRs/stack slots  (8 for float / 16 for vector / the element type
> size for
> integer element types) -- this corresponds to the "byval align" field
> 
> 
> With this support in place, the clang changes to implement the three
> features described above is straight-forward:
> - Implement code to detect float / vector homogeneous aggregates and
> pass/return them as array types using the appropriate float / vector
> element type.
> - Return small aggregates in registers (this doesn't even need the
> array
> support)
> - Pass [other] aggregates that fit fully in registers as direct
> integer
> arrays, using an element type to encode the alignment requirement
> (that
> would otherwise go to the "byval align" field).
> 
> The last piece is the most interesting one.  In general, clang does
> not
> know (on PowerPC) whether a type fits fully in registers.  We could
> add
> code to track expected register usage, as is done on several other
> platforms.  However, I've decided against that, because it not only
> duplicates a lot of the logic already implemented in the back-end,
> but it
> isn't even fully reliable, since IR transformation layers may modify
> function signatures.

Thank you for pursuing such a route. I'm strongly in favor of keeping this decision making in the backend whenever possible.

> 
> Instead, I'm now simply passing *any* aggregate passed by value using
> the
> array mechanism if its size is up to 64 bytes.   This means that some
> of
> those will end up being passed in stack slots anyway, but the
> generated
> code shouldn't be any worse either.  (*Large* aggregates remain
> passed
> using "byval" to enable optimized copying via memcpy etc.)
> 
> In fact, I've enabled use of direct array types for small aggregates
> even
> on ELFv1 -- it is not strictly necessary there, but should avoid
> forcing
> small aggregates to memory unnecessarily if they end up fully in
> registers.
> [ If prefered, I can revert that part and keep using byval on ELFv1
> ... ]
> 
> 
> Overall, it feels that this approach strikes a good balance between
> parts
> of the ABI implemented in clang (parsing the original type structure
> to
> detect homogeneous aggregates, and alignment requirements) vs. those
> implemented in LLVM (choosing registers / stack slots to implement
> those
> requirements), with a reasonably straight-forward IR representation
> in
> between, and no significant duplication of logic.
> 
> 
> A final note: as with the LLVM series, I have not (yet) implemented
> the
> -mabi=elfv1 / -mabi=elfv2 command line arguments.  Instead, we
> hard-code
> ppc64 to use ELFv1 and ppc64le to use ELFv2.  If desired, it should
> be
> straight-forward to implement the option machinery and hook it up to
> the
> ELFv2 ABI flag passed to the PPC64_SVR4_TargetCodeGenInfo
> constructor.

I do think this will be desirable, please submit a follow-up patch for that.

Regarding the patch:

+private:
+  ABIKind Kind;
+  ABIKind getABIKind() const { return Kind; }

You don't need to introduce a accessor function here if it is trivial and also private.

+    } else if (const VectorType *VT = Ty->getAs<VectorType>()) {
+      if (getContext().getTypeSize(VT) != 128)
+        return false;

Why not getContext().getTypeSize(VT) <= 128? If I have <3 x float> then this will be expanded to a <3 x float> in the backend, and I don't see why it should not also be passed in the vector registers?

+  uint32_t NumRegs = Base->isVectorType()? 1 :

Put a space before the ?

+    // the argument to memory.
+    uint64_t Size = getContext().getTypeSize(Ty);
+    if (Size > 0 && Size <= 8*64) {
+      llvm::Type *CoerceTy;
+
+      // Types up to 8 bytes are passed as integer type (which will be
+      // properly aligned in the argument save area doubleword).
+      if (Size <= 64)
+        CoerceTy = llvm::IntegerType::get(getVMContext(),
+                                          llvm::RoundUpToAlignment(Size, 8));
+      // Larger types are passed as arrays, with the base type selected
+      // according to the required alignment in the save area.
+      else {
+        uint64_t RegSize = ABIAlign * 8;
+        uint64_t NumRegs = llvm::RoundUpToAlignment(Size, RegSize) / RegSize;
+        llvm::Type *RegTy = llvm::IntegerType::get(getVMContext(), RegSize);
+        CoerceTy = llvm::ArrayType::get(RegTy, NumRegs);
+      }
+
+      return ABIArgInfo::getDirect(CoerceTy);
+    }

I'd really prefer that we use SizeInBits, RegSizeInBits (or just Bits, etc) instead of Size, RegSize, etc. Having sizes both in bits and bytes is a bit confusing ;)

+    // ELFv2 small aggregates are returned in up to two registers.
+    uint64_t Size = getContext().getTypeSize(RetTy);
+    if (getABIKind() == ELFv2 && Size <= 128) {
+      if (Size == 0)
+        return ABIArgInfo::getIgnore();
+
+      llvm::Type *CoerceTy;
+      if (Size > 64) {

Same here (SizeInBits), and you might as well write 128 as 2*64. In fact, it might be better just to introduce some constant named GPRBits == 64 (or similar) so that it is easier to understand.

-// CHECK: define void @test_struct_v16i16(%struct.v16i16* noalias sret %agg.result, %struct.v16i16* byval align 16)
+// CHECK: define void @test_struct_v16i16(%struct.v16i16* noalias sret %agg.result, [2 x i128] %x.coerce)
 struct v16i16 test_struct_v16i16(struct v16i16 x)
 {
   return x;

We have a bunch of tests like this, but I don't see any on the caller side. We need to make sure that the coersion works on the caller side. Please add some tests for that.

Also, please add a test or two for non-power-of-two vectors (<3 x float> is a good one):
  typedef float float3 __attribute__((ext_vector_type(3)));

Thanks again,
Hal

> 
> 
> (See attached file: diff-clang-elfv2)
> 
> Any review appreciated, thanks!
> 
> 
> Mit freundlichen Gruessen / Best Regards
> 
> Ulrich Weigand
> 
> --
>   Dr. Ulrich Weigand | Phone: +49-7031/16-3727
>   STSM, GNU/Linux compilers and toolchain
>   IBM Deutschland Research & Development GmbH
>   Vorsitzende des Aufsichtsrats: Martina Koederitz |
>   Geschäftsführung: Dirk
> Wittkopp
>   Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
> Stuttgart, HRB 243294

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory