[PATCH] PowerPC support for the ELFv2 ABI (powerpc64le-linux)
Hal Finkel
hfinkel at anl.gov
Fri Jul 18 14:32:01 PDT 2014
----- Original Message -----
> From: "Ulrich Weigand" <Ulrich.Weigand at de.ibm.com>
> To: "cfe-commits" <cfe-commits at cs.uiuc.edu>
> Cc: "Hal Finkel" <hfinkel at anl.gov>
> Sent: Monday, July 14, 2014 10:17:35 AM
> Subject: [PATCH] PowerPC support for the ELFv2 ABI (powerpc64le-linux)
>
>
>
> Hello,
>
> this patch implements clang support for the PowerPC ELFv2 ABI.
> Together
> with a series of companion patches in LLVM (posted at:
> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140714/225647.html
> ), this makes clang/LLVM fully usable on powerpc64le-linux.
>
> Most of the ELFv2 ABI changes are fully implemented on the LLVM side,
> please read the description in the mail cited above. On the clang
> side,
> however, we need to implement some changes in how aggregate types are
> passed by value. Specifically, we need to:
> - pass (and return) "homogeneous" floating-point or vector aggregates
> in
> FPRs and VRs (this is similar to the ARM homogeneous aggregate ABI)
> - return aggregates of up to 16 bytes in one or two GPRs
> - pass aggregates that fit fully in registers without using the
> "byval"
> mechanism (see discussion in the LLVM mail)
>
> The LLVM side implements the infrastructure required to do all this.
> In
> particular, LLVM implements support for passing array types directly,
> where
> the array element type encodes properties needed to handle all those
> ELFv2
> features correctly. Specifically, the array element type encodes:
> - whether the parameter should be passed in FPRs, VRs, or just
> GPRs/stack
> slots (for float / vector / integer element types, respectively)
> - what the alignment requirements of the parameter are when passed in
> GPRs/stack slots (8 for float / 16 for vector / the element type
> size for
> integer element types) -- this corresponds to the "byval align" field
>
>
> With this support in place, the clang changes to implement the three
> features described above is straight-forward:
> - Implement code to detect float / vector homogeneous aggregates and
> pass/return them as array types using the appropriate float / vector
> element type.
> - Return small aggregates in registers (this doesn't even need the
> array
> support)
> - Pass [other] aggregates that fit fully in registers as direct
> integer
> arrays, using an element type to encode the alignment requirement
> (that
> would otherwise go to the "byval align" field).
>
> The last piece is the most interesting one. In general, clang does
> not
> know (on PowerPC) whether a type fits fully in registers. We could
> add
> code to track expected register usage, as is done on several other
> platforms. However, I've decided against that, because it not only
> duplicates a lot of the logic already implemented in the back-end,
> but it
> isn't even fully reliable, since IR transformation layers may modify
> function signatures.
Thank you for pursuing such a route. I'm strongly in favor of keeping this decision making in the backend whenever possible.
>
> Instead, I'm now simply passing *any* aggregate passed by value using
> the
> array mechanism if its size is up to 64 bytes. This means that some
> of
> those will end up being passed in stack slots anyway, but the
> generated
> code shouldn't be any worse either. (*Large* aggregates remain
> passed
> using "byval" to enable optimized copying via memcpy etc.)
>
> In fact, I've enabled use of direct array types for small aggregates
> even
> on ELFv1 -- it is not strictly necessary there, but should avoid
> forcing
> small aggregates to memory unnecessarily if they end up fully in
> registers.
> [ If prefered, I can revert that part and keep using byval on ELFv1
> ... ]
>
>
> Overall, it feels that this approach strikes a good balance between
> parts
> of the ABI implemented in clang (parsing the original type structure
> to
> detect homogeneous aggregates, and alignment requirements) vs. those
> implemented in LLVM (choosing registers / stack slots to implement
> those
> requirements), with a reasonably straight-forward IR representation
> in
> between, and no significant duplication of logic.
>
>
> A final note: as with the LLVM series, I have not (yet) implemented
> the
> -mabi=elfv1 / -mabi=elfv2 command line arguments. Instead, we
> hard-code
> ppc64 to use ELFv1 and ppc64le to use ELFv2. If desired, it should
> be
> straight-forward to implement the option machinery and hook it up to
> the
> ELFv2 ABI flag passed to the PPC64_SVR4_TargetCodeGenInfo
> constructor.
I do think this will be desirable, please submit a follow-up patch for that.
Regarding the patch:
+private:
+ ABIKind Kind;
+ ABIKind getABIKind() const { return Kind; }
You don't need to introduce a accessor function here if it is trivial and also private.
+ } else if (const VectorType *VT = Ty->getAs<VectorType>()) {
+ if (getContext().getTypeSize(VT) != 128)
+ return false;
Why not getContext().getTypeSize(VT) <= 128? If I have <3 x float> then this will be expanded to a <3 x float> in the backend, and I don't see why it should not also be passed in the vector registers?
+ uint32_t NumRegs = Base->isVectorType()? 1 :
Put a space before the ?
+ // the argument to memory.
+ uint64_t Size = getContext().getTypeSize(Ty);
+ if (Size > 0 && Size <= 8*64) {
+ llvm::Type *CoerceTy;
+
+ // Types up to 8 bytes are passed as integer type (which will be
+ // properly aligned in the argument save area doubleword).
+ if (Size <= 64)
+ CoerceTy = llvm::IntegerType::get(getVMContext(),
+ llvm::RoundUpToAlignment(Size, 8));
+ // Larger types are passed as arrays, with the base type selected
+ // according to the required alignment in the save area.
+ else {
+ uint64_t RegSize = ABIAlign * 8;
+ uint64_t NumRegs = llvm::RoundUpToAlignment(Size, RegSize) / RegSize;
+ llvm::Type *RegTy = llvm::IntegerType::get(getVMContext(), RegSize);
+ CoerceTy = llvm::ArrayType::get(RegTy, NumRegs);
+ }
+
+ return ABIArgInfo::getDirect(CoerceTy);
+ }
I'd really prefer that we use SizeInBits, RegSizeInBits (or just Bits, etc) instead of Size, RegSize, etc. Having sizes both in bits and bytes is a bit confusing ;)
+ // ELFv2 small aggregates are returned in up to two registers.
+ uint64_t Size = getContext().getTypeSize(RetTy);
+ if (getABIKind() == ELFv2 && Size <= 128) {
+ if (Size == 0)
+ return ABIArgInfo::getIgnore();
+
+ llvm::Type *CoerceTy;
+ if (Size > 64) {
Same here (SizeInBits), and you might as well write 128 as 2*64. In fact, it might be better just to introduce some constant named GPRBits == 64 (or similar) so that it is easier to understand.
-// CHECK: define void @test_struct_v16i16(%struct.v16i16* noalias sret %agg.result, %struct.v16i16* byval align 16)
+// CHECK: define void @test_struct_v16i16(%struct.v16i16* noalias sret %agg.result, [2 x i128] %x.coerce)
struct v16i16 test_struct_v16i16(struct v16i16 x)
{
return x;
We have a bunch of tests like this, but I don't see any on the caller side. We need to make sure that the coersion works on the caller side. Please add some tests for that.
Also, please add a test or two for non-power-of-two vectors (<3 x float> is a good one):
typedef float float3 __attribute__((ext_vector_type(3)));
Thanks again,
Hal
>
>
> (See attached file: diff-clang-elfv2)
>
> Any review appreciated, thanks!
>
>
> Mit freundlichen Gruessen / Best Regards
>
> Ulrich Weigand
>
> --
> Dr. Ulrich Weigand | Phone: +49-7031/16-3727
> STSM, GNU/Linux compilers and toolchain
> IBM Deutschland Research & Development GmbH
> Vorsitzende des Aufsichtsrats: Martina Koederitz |
> Geschäftsführung: Dirk
> Wittkopp
> Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
> Stuttgart, HRB 243294
--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
More information about the cfe-commits
mailing list