[PATCH] PowerPC support for the ELFv2 ABI (powerpc64le-linux)

Wed Jul 16 05:32:51 PDT 2014

----- Original Message -----
> From: "Bill Schmidt" <wschmidt at linux.vnet.ibm.com>
> To: "Ulrich Weigand" <Ulrich.Weigand at de.ibm.com>
> Cc: "LLVM Commits" <llvm-commits at cs.uiuc.edu>
> Sent: Wednesday, July 16, 2014 7:27:04 AM
> Subject: Re: [PATCH] PowerPC support for the ELFv2 ABI (powerpc64le-linux)
> 
> FWIW, I reviewed Ulrich's patches prior to posting, and they look
> good
> to me.  Reviews from other folks are welcome!

Good. I'll look at them today.

 -Hal

> 
> Thanks,
> Bill
> 
> On Mon, 2014-07-14 at 17:16 +0200, Ulrich Weigand wrote:
> > 
> > Hello,
> > 
> > this patch series implements support in LLVM for the PowerPC ELFv2
> > ABI.
> > Together with a companion patch to clang (posted on cfe-commits),
> > this
> > makes clang/LLVM fully usable on powerpc64le-linux.  Overall the
> > patch
> > series passed the following testing (both on powerpc64-linux
> > (ELFv1) and
> > powerpc64le-linux (ELFv2)):
> > - building LLVM & clang, running the regression test suite
> > - running projects/test-suite
> > - full 3-stage bootstrap of clang
> > - GCC ABI compatibility test suite GCC vs. clang  [*]
> > 
> > [*] There are some failures due to GCC features clang does not
> > implement
> > (or implements slightly differently than GCC, like
> > attribute((aligned)) on
> > bit field base types), but those seem platform-independent, and are
> > the
> > same on ELFv1 and ELFv2.
> > 
> > I've broken up ELFv2 support into the following pieces:
> > 
> > 
> > - MC support for .abiversion directive
> > 
> > ELFv2 binaries are marked by a bit in the ELF header e_flags field.
> >  A new
> > assembler directive .abiversion can be used to set that flag.  This
> > patch
> > implements support in the PowerPC MC streamers to emit the
> > .abiversion
> > directive (both into assembler and ELF binary output), as well as
> > support
> > in the assembler parser to parse the .abiversion directive.
> > 
> > (See attached file: diff-llvm-elfv2-abiversion)
> > 
> > 
> > - MC support for .localentry directive
> > 
> > A second binutils feature needed to support ELFv2 is the
> > .localentry
> > directive.  In the ELFv2 ABI, functions may have two entry points:
> > one for
> > calling the routine locally via "bl", and one for calling the
> > function via
> > function pointer (either at the source level, or implicitly via a
> > PLT stub
> > for global calls).  The two entry points share a single ELF symbol,
> > where
> > the ELF symbol address identifies the global entry point address,
> > while the
> > local entry point is found by adding a delta offset to the symbol
> > address.
> > That offset is encoded into three platform-specific bits of the ELF
> > symbol
> > st_other field.
> > 
> > The .localentry directive instructs the assembler to set those
> > fields to
> > encode a particular offset.  This is typically used by a function
> > prologue
> > sequence like this:
> > 
> > func:
> >         addis r2, r12, (.TOC.-func)@ha
> >         addi r2, r2, (.TOC.-func)@l
> >         .localentry func, .-func
> > 
> > Note that according to the ABI, when calling the global entry
> > point, r12
> > must be set to point the global entry point address itself; while
> > when
> > calling the local entry point, r2 must be set to point to the TOC
> > base.
> > The two instructions between the global and local entry point in
> > the above
> > example translate the first requirement into the second.
> > 
> > This following patch implements support in the PowerPC MC streamers
> > to emit
> > the .localentry directive (both into assembler and ELF object
> > output), as
> > well as support in the assembler parser to parse the .localentry
> > directive.
> > 
> > In addition, there is another change required in MC
> > fixup/relocation
> > handling to properly deal with relocations targeting function
> > symbols with
> > two entry points: When the target function is known local, the MC
> > layer
> > would immediately handle the fixup by inserting the target address
> > -- this
> > is wrong, since the call may need to go to the local entry point
> > instead.
> > The GNU assembler handles this case by *not* directly resolving
> > fixups
> > targeting functions with two entry points, but always emits the
> > relocation
> > and relies on the linker to handle this case correctly.  This patch
> > changes
> > LLVM MC to do the same (this is done via the processFixupValue
> > routine).
> > 
> > Similarly, there are cases where the assembler would normally emit
> > a
> > relocation, but "simplify" it to a relocation targeting a *section*
> > instead
> > of the actual symbol.  For the same reason as above, this may be
> > wrong when
> > the target symbol has two entry points.  The GNU assembler again
> > handles
> > this case by not performing this simplification in that case, but
> > leaving
> > the relocation targeting the full symbol, which is then resolved by
> > the
> > linker.  This patch changes LLVM MC to do the same (the
> > needsRelocateWithSymbol routine).   NOTE: the LLVM code is actually
> > overly
> > pessimistic, since the needsRelocateWithSymbol routine currently
> > does not
> > have access to the actual target symbol, and thus must always
> > assume that
> > it might have two entry points.  This can be improved upon by
> > modifying
> > common code to pass the target symbol when calling
> > needsRelocateWithSymbol
> > (probably best done as a follow-on patch).
> > 
> > (See attached file: diff-llvm-elfv2-localentry)
> > 
> > 
> > - ELFv2 function call changes: two entry points instead of function
> > descriptors
> > 
> > This patch build upon the two preceding MC changes to implement the
> > basic
> > ELFv2 function call convention.  In the ELFv1 ABI, a "function
> > descriptor"
> > was associated with every function, pointing to both the entry
> > address and
> > the related TOC base (and a static chain pointer for nested
> > functions).
> > Function pointers would actually refer to that descriptor, and the
> > indirect
> > call sequence needed to load up both entry address and TOC base.
> > 
> > In the ELFv2 ABI, there are no more function descriptors, and
> > function
> > pointers simply refer to the (global) entry point of the function
> > code.
> > Indirect function calls simply branch to that address, after
> > loading it up
> > into r12 (as required by the ABI rules for a global entry point).
> >  Direct
> > function calls continue to just do a "bl" to the target symbol;
> > this will
> > be resolved by the linker to the local entry point of the target
> > function
> > if it is local, and to a PLT stub if it is global.  That PLT stub
> > would
> > then load the (global) entry point address of the final target into
> > r12 and
> > branch to it.  Note that when performing a local function call, r2
> > must be
> > set up to point to the current TOC base: if the target ends up
> > local, the
> > ABI requires that its local entry point is called with r2 set up;
> > if the
> > target ends up global, the PLT stub requires that r2 is set up.
> > 
> > This patch implements all LLVM changes to implement that scheme:
> > - No longer create a function descriptor when emitting a function
> > definition (in EmitFunctionEntryLabel)
> > - Emit two entry points *if* the function needs the TOC base (r2)
> > anywhere
> > (this is done EmitFunctionBodyStart; note that this cannot be done
> > in
> > EmitFunctionBodyStart because the global entry point prologue code
> > must be
> > *part* of the function as covered by debug info).
> > - In order to make use tracking of r2 (as needed above) work
> > correctly,
> > mark direct function calls as implicitly using r2.
> > - Implement the ELFv2 indirect function call sequence (no function
> > descriptors; load target address into r12).
> > - When creating an ELFv2 object file, emit the .abiversion 2
> > directive to
> > tell the linker to create the appropriate version of PLT stubs.
> > 
> > Note that all this is triggered by a predicate isELFv2ABI.  This is
> > currently hard-coded to be true iff the "little-endian 64-bit
> > SVR4" (ppc64le) triple is selected.  To be fully compatible with
> > GCC, we
> > should really implement the -mabi=elfv1 / -mabi=elfv2 option pair
> > and
> > support both ELFv1 and ELFv2 on both powerpc64-linux and
> > powerpc64le-linux
> > targets, with big-endian defaulting to ELFv1 and little-endian
> > defaulting
> > to ELFv2.  However, since the BE ELFv2 and LE ELFv1 case are only
> > theoretical options at this point (there's no library support for
> > those in
> > any current or planned Linux distribution), I haven't implemented
> > this yet.
> > It should be straightforward to add this support as a follow-on
> > patch by
> > just implementing the option machinery and hooking it up to the
> > isELFv2ABI
> > predicate.
> > 
> > (See attached file: diff-llvm-elfv2-funcdesc)
> > 
> > 
> > - ELFv2 stack space reductions
> > 
> > The ELFv2 ABI reduces the amount of stack required to implement an
> > ABI-compliant function call in two ways:
> > * the "linkage area" is reduced from 48 bytes to 32 bytes by
> > eliminating
> > two unused doublewords
> > * the 64-byte "parameter save area" is now optional and need not be
> > present
> > in certain cases
> >    (it remains mandatory in functions with variable arguments, and
> > functions that have any parameter that is passed on the stack)
> > 
> > The following patch implements this required changes:
> > - reducing the linkage area, and associated relocation of the TOC
> > save
> > slot, in getLinkageSize / getTOCSaveOffset
> >   (this requires updating all callers of these routines to pass in
> >   the
> > isELFv2ABI flag).
> > - (partially) handling the case where the parameter save are is
> > optional
> > 
> > This latter part requires some extra explanation:  Currently, we
> > still
> > always allocate the parameter save area when *calling* a function.
> >  That is
> > certainly always compliant with the ABI, but may cause code to
> > allocate
> > stack unnecessarily.  This can be addressed by a follow-on
> > optimization
> > patch.
> > 
> > On the *callee* side, in LowerFormalArguments, we *must* track
> > correctly
> > whether the ABI guarantees that the caller has allocated the
> > parameter save
> > area for our use, and the patch does so. However, there is one
> > complication: the code that handles incoming "byval" arguments will
> > currently *always* write to the parameter save area, because it has
> > to
> > force incoming register arguments to the stack since it must return
> > an
> > *address* to implement the byval semantics.  This is already
> > inefficient in
> > some cases in the ELFv1 ABI, but in the ELFv2 ABI it would be
> > actually
> > buggy since it would write to the argument save area that the
> > caller
> > actually did *not* allocate.
> > 
> > There are two options to fix this: One would be that the
> > LowerFormalArguments code could keep its overall logic, except it
> > writes
> > arguments to a freshly allocated stack slot on the function's own
> > stack
> > frame instead of the argument save area in those cases where that
> > area is
> > not present.  I chose *not* to implement this, since writing
> > arguments that
> > already fit fully in registers to the stack *is* inefficient.
> >  Instead I
> > chose the second option: have the front-end pass such arguments in
> > a way
> > that does *not* use the "byval" scheme in the first place.  This is
> > implemented in the diff-llvm-elfv2-aggregates patch below and the
> > associated clang patch.  In this patch I simply verify that if
> > there is no
> > argument save area guaranteed by the ABI, we have no byval
> > arguments, and
> > report a fatal LLVM ERROR otherwise.   This unfortunately makes the
> > platform-independent DebugInfo/2010-10-01-crash.ll case fail since
> > it uses
> > a byval parameter in a way that is now unsupported.
> > 
> > (See attached file: diff-llvm-elfv2-stack)
> > 
> > 
> > - ELFv2 explicit CFI for CR fields
> > 
> > This is a minor improvement in the ELFv2 ABI.   In ELFv1, DWARF CFI
> > would
> > represent a saved CR word (holding CR fields CR2, CR3, and CR4)
> > using just
> > a single CFI record refering to CR2.   In ELFv2 instead, each of
> > the CR
> > fields is represented by its own CFI record.  The advantage is that
> > the
> > compiler can now chose to save just a single (or two) CR fields
> > instead of
> > all of them, if those are the only ones that actually need saving.
> >  That
> > can lead to more efficient code using mf(o)crf instead of the
> > (slow) mfcr
> > instruction.
> > 
> > Note that the following patch does not (yet) implement this more
> > efficient
> > code generation, but it does implement the part that is required to
> > be ABI
> > compliant: creating multiple CFI records if multiple CR fields are
> > saved.
> > 
> > (See attached file: diff-llvm-elfv2-crsave)
> > 
> > 
> > - ELFv2 aggregate passing support
> > 
> > This patch is intended to work together with the clang companion
> > patch.
> > The LLVM patch provides infrastructure that allows the clang side
> > to
> > implement the missing pieces of the ELFv2 ABI relating to
> > aggregates passed
> > by value.  Specifically, we need to:
> > - pass (and return) "homogeneous" floating-point or vector
> > aggregates in
> > FPRs and VRs (this is similar to the ARM homogeneous aggregate ABI)
> > - return aggregates of up to 16 bytes in one or two GPRs
> > - pass aggregates that fit fully in registers without using the
> > "byval"
> > mechanism (see discussion of the diff-llvm-elfv2-stack)
> > 
> > As infrastructure to enable those changes, this LLVM patch adds
> > support for
> > passing array types directly.  These can be used by the front-end
> > to pass
> > aggregate types (coerced to an appropriate array type).  The
> > details of the
> > array type being used inform the back-end about ABI-relevant
> > properties.
> > Specifically, the array element type encodes:
> > - whether the parameter should be passed in FPRs, VRs, or just
> > GPRs/stack
> > slots  (for float / vector / integer element types, respectively)
> > - what the alignment requirements of the parameter are when passed
> > in
> > GPRs/stack slots  (8 for float / 16 for vector / the element type
> > size for
> > integer element types) -- this corresponds to the "byval align"
> > field
> > 
> > The following patch uses the
> > functionArgumentNeedsConsecutiveRegisters
> > callback to encode that special treatment is required for all
> > directly-passed array types.  The isInConsecutiveRegs /
> > isInConsecutiveRegsLast bits set as a results are then used to
> > implement
> > the required size and alignment rules in CalculateStackSlotSize /
> > CalculateStackSlotAlignment etc.
> > 
> > As a related change, the ABI routines have to be modified to
> > support
> > passing floating-point types in GPRs.  This is necessary because
> > with
> > homogeneous aggregates of 4-byte float type we can now run out of
> > FPRs
> > *before* we run out of the 64-byte argument save area that is
> > shadowed by
> > GPRs.  Any extra floating-point arguments that no longer fit in
> > FPRs must
> > now be passed in GPRs until we run out of those too.  Note that
> > there was
> > already code to pass floating-point arguments in GPRs used with
> > vararg
> > parameters, which was done by writing the argument out to the
> > argument save
> > area first and then reloading into GPRs.  The patch re-implements
> > this,
> > however, in favor of code packing float arguments directly via
> > extension/truncation, BITCAST, and BUILD_PAIR operations.  This has
> > some
> > advantages:
> > - we no longer rely on the argument save area being present
> > - while the BITCASTs will currently often also result in values
> > being
> > written to the stack and then reloaded, this should improve once we
> > implement the Power8 GPR<->FPR move instructions.
> > 
> > The final part of the patch enables up to 8 FPRs and VRs for
> > argument
> > return in PPCCallingConv.td; this is required to support returning
> > ELFv2
> > homogeneous aggregates.  (Note that this doesn't affect other ABIs
> > since
> > LLVM wil only look for which register to use if the parameter is
> > marked as
> > "direct" return anyway.)
> > 
> > (See attached file: diff-llvm-elfv2-aggregates)
> > 
> > 
> > - ELFv2 dynamic loader support
> > 
> > This is the final piece of ELFv2 support in LLVM: it enables the
> > new ABI in
> > the runtime dynamic loader.  The loader has to implement the
> > following
> > features:
> > - In the ELFv2 ABI, do not look up a function descriptor in .opd,
> > but
> > instead use the local entry point when resolving a direct call.
> > - Update the TOC restore code to use the new TOC slot linkage area
> > offset.
> > - Create PLT stubs appropriate for the ELFv2 ABI.
> > 
> > Note that this patch also adds common-code changes. These are
> > necessary
> > because the loader must check the newly added ELF flags: the
> > e_flags header
> > bits encoding the ABI version, and the st_other symbol table entry
> > bits
> > encoding the local entry point offset.  There is currently no way
> > to access
> > these, so I've added ObjectFile::getPlatformFlags and
> > SymbolRef::getOther
> > accessors.
> > 
> > (See attached file: diff-llvm-elfv2-dyld)
> > 
> > 
> > I'd appreciate any review of the patch series!   I'm aware it's a
> > lot of
> > code, but I'd really like to see clang/LLVM usable out-of-the-box
> > on
> > powercp64le-linux soon (hopefully even in 3.5)!
> > 
> > 
> > Mit freundlichen Gruessen / Best Regards
> > 
> > Ulrich Weigand
> > 
> > --
> >   Dr. Ulrich Weigand | Phone: +49-7031/16-3727
> >   STSM, GNU/Linux compilers and toolchain
> >   IBM Deutschland Research & Development GmbH
> >   Vorsitzende des Aufsichtsrats: Martina Koederitz |
> >   Geschäftsführung: Dirk
> > Wittkopp
> >   Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
> > Stuttgart, HRB 243294
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory