[PATCH] PowerPC support for the ELFv2 ABI (powerpc64le-linux)

Wed Jul 16 05:27:04 PDT 2014

FWIW, I reviewed Ulrich's patches prior to posting, and they look good
to me.  Reviews from other folks are welcome!

Thanks,
Bill

On Mon, 2014-07-14 at 17:16 +0200, Ulrich Weigand wrote:
> 
> Hello,
> 
> this patch series implements support in LLVM for the PowerPC ELFv2 ABI.
> Together with a companion patch to clang (posted on cfe-commits), this
> makes clang/LLVM fully usable on powerpc64le-linux.  Overall the patch
> series passed the following testing (both on powerpc64-linux (ELFv1) and
> powerpc64le-linux (ELFv2)):
> - building LLVM & clang, running the regression test suite
> - running projects/test-suite
> - full 3-stage bootstrap of clang
> - GCC ABI compatibility test suite GCC vs. clang  [*]
> 
> [*] There are some failures due to GCC features clang does not implement
> (or implements slightly differently than GCC, like attribute((aligned)) on
> bit field base types), but those seem platform-independent, and are the
> same on ELFv1 and ELFv2.
> 
> I've broken up ELFv2 support into the following pieces:
> 
> 
> - MC support for .abiversion directive
> 
> ELFv2 binaries are marked by a bit in the ELF header e_flags field.  A new
> assembler directive .abiversion can be used to set that flag.  This patch
> implements support in the PowerPC MC streamers to emit the .abiversion
> directive (both into assembler and ELF binary output), as well as support
> in the assembler parser to parse the .abiversion directive.
> 
> (See attached file: diff-llvm-elfv2-abiversion)
> 
> 
> - MC support for .localentry directive
> 
> A second binutils feature needed to support ELFv2 is the .localentry
> directive.  In the ELFv2 ABI, functions may have two entry points: one for
> calling the routine locally via "bl", and one for calling the function via
> function pointer (either at the source level, or implicitly via a PLT stub
> for global calls).  The two entry points share a single ELF symbol, where
> the ELF symbol address identifies the global entry point address, while the
> local entry point is found by adding a delta offset to the symbol address.
> That offset is encoded into three platform-specific bits of the ELF symbol
> st_other field.
> 
> The .localentry directive instructs the assembler to set those fields to
> encode a particular offset.  This is typically used by a function prologue
> sequence like this:
> 
> func:
>         addis r2, r12, (.TOC.-func)@ha
>         addi r2, r2, (.TOC.-func)@l
>         .localentry func, .-func
> 
> Note that according to the ABI, when calling the global entry point, r12
> must be set to point the global entry point address itself; while when
> calling the local entry point, r2 must be set to point to the TOC base.
> The two instructions between the global and local entry point in the above
> example translate the first requirement into the second.
> 
> This following patch implements support in the PowerPC MC streamers to emit
> the .localentry directive (both into assembler and ELF object output), as
> well as support in the assembler parser to parse the .localentry directive.
> 
> In addition, there is another change required in MC fixup/relocation
> handling to properly deal with relocations targeting function symbols with
> two entry points: When the target function is known local, the MC layer
> would immediately handle the fixup by inserting the target address -- this
> is wrong, since the call may need to go to the local entry point instead.
> The GNU assembler handles this case by *not* directly resolving fixups
> targeting functions with two entry points, but always emits the relocation
> and relies on the linker to handle this case correctly.  This patch changes
> LLVM MC to do the same (this is done via the processFixupValue routine).
> 
> Similarly, there are cases where the assembler would normally emit a
> relocation, but "simplify" it to a relocation targeting a *section* instead
> of the actual symbol.  For the same reason as above, this may be wrong when
> the target symbol has two entry points.  The GNU assembler again handles
> this case by not performing this simplification in that case, but leaving
> the relocation targeting the full symbol, which is then resolved by the
> linker.  This patch changes LLVM MC to do the same (the
> needsRelocateWithSymbol routine).   NOTE: the LLVM code is actually overly
> pessimistic, since the needsRelocateWithSymbol routine currently does not
> have access to the actual target symbol, and thus must always assume that
> it might have two entry points.  This can be improved upon by modifying
> common code to pass the target symbol when calling needsRelocateWithSymbol
> (probably best done as a follow-on patch).
> 
> (See attached file: diff-llvm-elfv2-localentry)
> 
> 
> - ELFv2 function call changes: two entry points instead of function
> descriptors
> 
> This patch build upon the two preceding MC changes to implement the basic
> ELFv2 function call convention.  In the ELFv1 ABI, a "function descriptor"
> was associated with every function, pointing to both the entry address and
> the related TOC base (and a static chain pointer for nested functions).
> Function pointers would actually refer to that descriptor, and the indirect
> call sequence needed to load up both entry address and TOC base.
> 
> In the ELFv2 ABI, there are no more function descriptors, and function
> pointers simply refer to the (global) entry point of the function code.
> Indirect function calls simply branch to that address, after loading it up
> into r12 (as required by the ABI rules for a global entry point).  Direct
> function calls continue to just do a "bl" to the target symbol; this will
> be resolved by the linker to the local entry point of the target function
> if it is local, and to a PLT stub if it is global.  That PLT stub would
> then load the (global) entry point address of the final target into r12 and
> branch to it.  Note that when performing a local function call, r2 must be
> set up to point to the current TOC base: if the target ends up local, the
> ABI requires that its local entry point is called with r2 set up; if the
> target ends up global, the PLT stub requires that r2 is set up.
> 
> This patch implements all LLVM changes to implement that scheme:
> - No longer create a function descriptor when emitting a function
> definition (in EmitFunctionEntryLabel)
> - Emit two entry points *if* the function needs the TOC base (r2) anywhere
> (this is done EmitFunctionBodyStart; note that this cannot be done in
> EmitFunctionBodyStart because the global entry point prologue code must be
> *part* of the function as covered by debug info).
> - In order to make use tracking of r2 (as needed above) work correctly,
> mark direct function calls as implicitly using r2.
> - Implement the ELFv2 indirect function call sequence (no function
> descriptors; load target address into r12).
> - When creating an ELFv2 object file, emit the .abiversion 2 directive to
> tell the linker to create the appropriate version of PLT stubs.
> 
> Note that all this is triggered by a predicate isELFv2ABI.  This is
> currently hard-coded to be true iff the "little-endian 64-bit
> SVR4" (ppc64le) triple is selected.  To be fully compatible with GCC, we
> should really implement the -mabi=elfv1 / -mabi=elfv2 option pair and
> support both ELFv1 and ELFv2 on both powerpc64-linux and powerpc64le-linux
> targets, with big-endian defaulting to ELFv1 and little-endian defaulting
> to ELFv2.  However, since the BE ELFv2 and LE ELFv1 case are only
> theoretical options at this point (there's no library support for those in
> any current or planned Linux distribution), I haven't implemented this yet.
> It should be straightforward to add this support as a follow-on patch by
> just implementing the option machinery and hooking it up to the isELFv2ABI
> predicate.
> 
> (See attached file: diff-llvm-elfv2-funcdesc)
> 
> 
> - ELFv2 stack space reductions
> 
> The ELFv2 ABI reduces the amount of stack required to implement an
> ABI-compliant function call in two ways:
> * the "linkage area" is reduced from 48 bytes to 32 bytes by eliminating
> two unused doublewords
> * the 64-byte "parameter save area" is now optional and need not be present
> in certain cases
>    (it remains mandatory in functions with variable arguments, and
> functions that have any parameter that is passed on the stack)
> 
> The following patch implements this required changes:
> - reducing the linkage area, and associated relocation of the TOC save
> slot, in getLinkageSize / getTOCSaveOffset
>   (this requires updating all callers of these routines to pass in the
> isELFv2ABI flag).
> - (partially) handling the case where the parameter save are is optional
> 
> This latter part requires some extra explanation:  Currently, we still
> always allocate the parameter save area when *calling* a function.  That is
> certainly always compliant with the ABI, but may cause code to allocate
> stack unnecessarily.  This can be addressed by a follow-on optimization
> patch.
> 
> On the *callee* side, in LowerFormalArguments, we *must* track correctly
> whether the ABI guarantees that the caller has allocated the parameter save
> area for our use, and the patch does so. However, there is one
> complication: the code that handles incoming "byval" arguments will
> currently *always* write to the parameter save area, because it has to
> force incoming register arguments to the stack since it must return an
> *address* to implement the byval semantics.  This is already inefficient in
> some cases in the ELFv1 ABI, but in the ELFv2 ABI it would be actually
> buggy since it would write to the argument save area that the caller
> actually did *not* allocate.
> 
> There are two options to fix this: One would be that the
> LowerFormalArguments code could keep its overall logic, except it writes
> arguments to a freshly allocated stack slot on the function's own stack
> frame instead of the argument save area in those cases where that area is
> not present.  I chose *not* to implement this, since writing arguments that
> already fit fully in registers to the stack *is* inefficient.  Instead I
> chose the second option: have the front-end pass such arguments in a way
> that does *not* use the "byval" scheme in the first place.  This is
> implemented in the diff-llvm-elfv2-aggregates patch below and the
> associated clang patch.  In this patch I simply verify that if there is no
> argument save area guaranteed by the ABI, we have no byval arguments, and
> report a fatal LLVM ERROR otherwise.   This unfortunately makes the
> platform-independent DebugInfo/2010-10-01-crash.ll case fail since it uses
> a byval parameter in a way that is now unsupported.
> 
> (See attached file: diff-llvm-elfv2-stack)
> 
> 
> - ELFv2 explicit CFI for CR fields
> 
> This is a minor improvement in the ELFv2 ABI.   In ELFv1, DWARF CFI would
> represent a saved CR word (holding CR fields CR2, CR3, and CR4) using just
> a single CFI record refering to CR2.   In ELFv2 instead, each of the CR
> fields is represented by its own CFI record.  The advantage is that the
> compiler can now chose to save just a single (or two) CR fields instead of
> all of them, if those are the only ones that actually need saving.  That
> can lead to more efficient code using mf(o)crf instead of the (slow) mfcr
> instruction.
> 
> Note that the following patch does not (yet) implement this more efficient
> code generation, but it does implement the part that is required to be ABI
> compliant: creating multiple CFI records if multiple CR fields are saved.
> 
> (See attached file: diff-llvm-elfv2-crsave)
> 
> 
> - ELFv2 aggregate passing support
> 
> This patch is intended to work together with the clang companion patch.
> The LLVM patch provides infrastructure that allows the clang side to
> implement the missing pieces of the ELFv2 ABI relating to aggregates passed
> by value.  Specifically, we need to:
> - pass (and return) "homogeneous" floating-point or vector aggregates in
> FPRs and VRs (this is similar to the ARM homogeneous aggregate ABI)
> - return aggregates of up to 16 bytes in one or two GPRs
> - pass aggregates that fit fully in registers without using the "byval"
> mechanism (see discussion of the diff-llvm-elfv2-stack)
> 
> As infrastructure to enable those changes, this LLVM patch adds support for
> passing array types directly.  These can be used by the front-end to pass
> aggregate types (coerced to an appropriate array type).  The details of the
> array type being used inform the back-end about ABI-relevant properties.
> Specifically, the array element type encodes:
> - whether the parameter should be passed in FPRs, VRs, or just GPRs/stack
> slots  (for float / vector / integer element types, respectively)
> - what the alignment requirements of the parameter are when passed in
> GPRs/stack slots  (8 for float / 16 for vector / the element type size for
> integer element types) -- this corresponds to the "byval align" field
> 
> The following patch uses the functionArgumentNeedsConsecutiveRegisters
> callback to encode that special treatment is required for all
> directly-passed array types.  The isInConsecutiveRegs /
> isInConsecutiveRegsLast bits set as a results are then used to implement
> the required size and alignment rules in CalculateStackSlotSize /
> CalculateStackSlotAlignment etc.
> 
> As a related change, the ABI routines have to be modified to support
> passing floating-point types in GPRs.  This is necessary because with
> homogeneous aggregates of 4-byte float type we can now run out of FPRs
> *before* we run out of the 64-byte argument save area that is shadowed by
> GPRs.  Any extra floating-point arguments that no longer fit in FPRs must
> now be passed in GPRs until we run out of those too.  Note that there was
> already code to pass floating-point arguments in GPRs used with vararg
> parameters, which was done by writing the argument out to the argument save
> area first and then reloading into GPRs.  The patch re-implements this,
> however, in favor of code packing float arguments directly via
> extension/truncation, BITCAST, and BUILD_PAIR operations.  This has some
> advantages:
> - we no longer rely on the argument save area being present
> - while the BITCASTs will currently often also result in values being
> written to the stack and then reloaded, this should improve once we
> implement the Power8 GPR<->FPR move instructions.
> 
> The final part of the patch enables up to 8 FPRs and VRs for argument
> return in PPCCallingConv.td; this is required to support returning ELFv2
> homogeneous aggregates.  (Note that this doesn't affect other ABIs since
> LLVM wil only look for which register to use if the parameter is marked as
> "direct" return anyway.)
> 
> (See attached file: diff-llvm-elfv2-aggregates)
> 
> 
> - ELFv2 dynamic loader support
> 
> This is the final piece of ELFv2 support in LLVM: it enables the new ABI in
> the runtime dynamic loader.  The loader has to implement the following
> features:
> - In the ELFv2 ABI, do not look up a function descriptor in .opd, but
> instead use the local entry point when resolving a direct call.
> - Update the TOC restore code to use the new TOC slot linkage area offset.
> - Create PLT stubs appropriate for the ELFv2 ABI.
> 
> Note that this patch also adds common-code changes. These are necessary
> because the loader must check the newly added ELF flags: the e_flags header
> bits encoding the ABI version, and the st_other symbol table entry bits
> encoding the local entry point offset.  There is currently no way to access
> these, so I've added ObjectFile::getPlatformFlags and SymbolRef::getOther
> accessors.
> 
> (See attached file: diff-llvm-elfv2-dyld)
> 
> 
> I'd appreciate any review of the patch series!   I'm aware it's a lot of
> code, but I'd really like to see clang/LLVM usable out-of-the-box on
> powercp64le-linux soon (hopefully even in 3.5)!
> 
> 
> Mit freundlichen Gruessen / Best Regards
> 
> Ulrich Weigand
> 
> --
>   Dr. Ulrich Weigand | Phone: +49-7031/16-3727
>   STSM, GNU/Linux compilers and toolchain
>   IBM Deutschland Research & Development GmbH
>   Vorsitzende des Aufsichtsrats: Martina Koederitz | Geschäftsführung: Dirk
> Wittkopp
>   Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
> Stuttgart, HRB 243294
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits