[PATCH] PowerPC support for the ELFv2 ABI (powerpc64le-linux)

Mon Jul 14 08:16:40 PDT 2014

Hello,

this patch series implements support in LLVM for the PowerPC ELFv2 ABI.
Together with a companion patch to clang (posted on cfe-commits), this
makes clang/LLVM fully usable on powerpc64le-linux.  Overall the patch
series passed the following testing (both on powerpc64-linux (ELFv1) and
powerpc64le-linux (ELFv2)):
- building LLVM & clang, running the regression test suite
- running projects/test-suite
- full 3-stage bootstrap of clang
- GCC ABI compatibility test suite GCC vs. clang  [*]

[*] There are some failures due to GCC features clang does not implement
(or implements slightly differently than GCC, like attribute((aligned)) on
bit field base types), but those seem platform-independent, and are the
same on ELFv1 and ELFv2.

I've broken up ELFv2 support into the following pieces:

- MC support for .abiversion directive

ELFv2 binaries are marked by a bit in the ELF header e_flags field.  A new
assembler directive .abiversion can be used to set that flag.  This patch
implements support in the PowerPC MC streamers to emit the .abiversion
directive (both into assembler and ELF binary output), as well as support
in the assembler parser to parse the .abiversion directive.

(See attached file: diff-llvm-elfv2-abiversion)

- MC support for .localentry directive

A second binutils feature needed to support ELFv2 is the .localentry
directive.  In the ELFv2 ABI, functions may have two entry points: one for
calling the routine locally via "bl", and one for calling the function via
function pointer (either at the source level, or implicitly via a PLT stub
for global calls).  The two entry points share a single ELF symbol, where
the ELF symbol address identifies the global entry point address, while the
local entry point is found by adding a delta offset to the symbol address.
That offset is encoded into three platform-specific bits of the ELF symbol
st_other field.

The .localentry directive instructs the assembler to set those fields to
encode a particular offset.  This is typically used by a function prologue
sequence like this:

func:
        addis r2, r12, (.TOC.-func)@ha
        addi r2, r2, (.TOC.-func)@l
        .localentry func, .-func

Note that according to the ABI, when calling the global entry point, r12
must be set to point the global entry point address itself; while when
calling the local entry point, r2 must be set to point to the TOC base.
The two instructions between the global and local entry point in the above
example translate the first requirement into the second.

This following patch implements support in the PowerPC MC streamers to emit
the .localentry directive (both into assembler and ELF object output), as
well as support in the assembler parser to parse the .localentry directive.

In addition, there is another change required in MC fixup/relocation
handling to properly deal with relocations targeting function symbols with
two entry points: When the target function is known local, the MC layer
would immediately handle the fixup by inserting the target address -- this
is wrong, since the call may need to go to the local entry point instead.
The GNU assembler handles this case by *not* directly resolving fixups
targeting functions with two entry points, but always emits the relocation
and relies on the linker to handle this case correctly.  This patch changes
LLVM MC to do the same (this is done via the processFixupValue routine).

Similarly, there are cases where the assembler would normally emit a
relocation, but "simplify" it to a relocation targeting a *section* instead
of the actual symbol.  For the same reason as above, this may be wrong when
the target symbol has two entry points.  The GNU assembler again handles
this case by not performing this simplification in that case, but leaving
the relocation targeting the full symbol, which is then resolved by the
linker.  This patch changes LLVM MC to do the same (the
needsRelocateWithSymbol routine).   NOTE: the LLVM code is actually overly
pessimistic, since the needsRelocateWithSymbol routine currently does not
have access to the actual target symbol, and thus must always assume that
it might have two entry points.  This can be improved upon by modifying
common code to pass the target symbol when calling needsRelocateWithSymbol
(probably best done as a follow-on patch).

(See attached file: diff-llvm-elfv2-localentry)

- ELFv2 function call changes: two entry points instead of function
descriptors

This patch build upon the two preceding MC changes to implement the basic
ELFv2 function call convention.  In the ELFv1 ABI, a "function descriptor"
was associated with every function, pointing to both the entry address and
the related TOC base (and a static chain pointer for nested functions).
Function pointers would actually refer to that descriptor, and the indirect
call sequence needed to load up both entry address and TOC base.

In the ELFv2 ABI, there are no more function descriptors, and function
pointers simply refer to the (global) entry point of the function code.
Indirect function calls simply branch to that address, after loading it up
into r12 (as required by the ABI rules for a global entry point).  Direct
function calls continue to just do a "bl" to the target symbol; this will
be resolved by the linker to the local entry point of the target function
if it is local, and to a PLT stub if it is global.  That PLT stub would
then load the (global) entry point address of the final target into r12 and
branch to it.  Note that when performing a local function call, r2 must be
set up to point to the current TOC base: if the target ends up local, the
ABI requires that its local entry point is called with r2 set up; if the
target ends up global, the PLT stub requires that r2 is set up.

This patch implements all LLVM changes to implement that scheme:
- No longer create a function descriptor when emitting a function
definition (in EmitFunctionEntryLabel)
- Emit two entry points *if* the function needs the TOC base (r2) anywhere
(this is done EmitFunctionBodyStart; note that this cannot be done in
EmitFunctionBodyStart because the global entry point prologue code must be
*part* of the function as covered by debug info).
- In order to make use tracking of r2 (as needed above) work correctly,
mark direct function calls as implicitly using r2.
- Implement the ELFv2 indirect function call sequence (no function
descriptors; load target address into r12).
- When creating an ELFv2 object file, emit the .abiversion 2 directive to
tell the linker to create the appropriate version of PLT stubs.

Note that all this is triggered by a predicate isELFv2ABI.  This is
currently hard-coded to be true iff the "little-endian 64-bit
SVR4" (ppc64le) triple is selected.  To be fully compatible with GCC, we
should really implement the -mabi=elfv1 / -mabi=elfv2 option pair and
support both ELFv1 and ELFv2 on both powerpc64-linux and powerpc64le-linux
targets, with big-endian defaulting to ELFv1 and little-endian defaulting
to ELFv2.  However, since the BE ELFv2 and LE ELFv1 case are only
theoretical options at this point (there's no library support for those in
any current or planned Linux distribution), I haven't implemented this yet.
It should be straightforward to add this support as a follow-on patch by
just implementing the option machinery and hooking it up to the isELFv2ABI
predicate.

(See attached file: diff-llvm-elfv2-funcdesc)

- ELFv2 stack space reductions

The ELFv2 ABI reduces the amount of stack required to implement an
ABI-compliant function call in two ways:
* the "linkage area" is reduced from 48 bytes to 32 bytes by eliminating
two unused doublewords
* the 64-byte "parameter save area" is now optional and need not be present
in certain cases
   (it remains mandatory in functions with variable arguments, and
functions that have any parameter that is passed on the stack)

The following patch implements this required changes:
- reducing the linkage area, and associated relocation of the TOC save
slot, in getLinkageSize / getTOCSaveOffset
  (this requires updating all callers of these routines to pass in the
isELFv2ABI flag).
- (partially) handling the case where the parameter save are is optional

This latter part requires some extra explanation:  Currently, we still
always allocate the parameter save area when *calling* a function.  That is
certainly always compliant with the ABI, but may cause code to allocate
stack unnecessarily.  This can be addressed by a follow-on optimization
patch.

On the *callee* side, in LowerFormalArguments, we *must* track correctly
whether the ABI guarantees that the caller has allocated the parameter save
area for our use, and the patch does so. However, there is one
complication: the code that handles incoming "byval" arguments will
currently *always* write to the parameter save area, because it has to
force incoming register arguments to the stack since it must return an
*address* to implement the byval semantics.  This is already inefficient in
some cases in the ELFv1 ABI, but in the ELFv2 ABI it would be actually
buggy since it would write to the argument save area that the caller
actually did *not* allocate.

There are two options to fix this: One would be that the
LowerFormalArguments code could keep its overall logic, except it writes
arguments to a freshly allocated stack slot on the function's own stack
frame instead of the argument save area in those cases where that area is
not present.  I chose *not* to implement this, since writing arguments that
already fit fully in registers to the stack *is* inefficient.  Instead I
chose the second option: have the front-end pass such arguments in a way
that does *not* use the "byval" scheme in the first place.  This is
implemented in the diff-llvm-elfv2-aggregates patch below and the
associated clang patch.  In this patch I simply verify that if there is no
argument save area guaranteed by the ABI, we have no byval arguments, and
report a fatal LLVM ERROR otherwise.   This unfortunately makes the
platform-independent DebugInfo/2010-10-01-crash.ll case fail since it uses
a byval parameter in a way that is now unsupported.

(See attached file: diff-llvm-elfv2-stack)

- ELFv2 explicit CFI for CR fields

This is a minor improvement in the ELFv2 ABI.   In ELFv1, DWARF CFI would
represent a saved CR word (holding CR fields CR2, CR3, and CR4) using just
a single CFI record refering to CR2.   In ELFv2 instead, each of the CR
fields is represented by its own CFI record.  The advantage is that the
compiler can now chose to save just a single (or two) CR fields instead of
all of them, if those are the only ones that actually need saving.  That
can lead to more efficient code using mf(o)crf instead of the (slow) mfcr
instruction.

Note that the following patch does not (yet) implement this more efficient
code generation, but it does implement the part that is required to be ABI
compliant: creating multiple CFI records if multiple CR fields are saved.

(See attached file: diff-llvm-elfv2-crsave)

- ELFv2 aggregate passing support

This patch is intended to work together with the clang companion patch.
The LLVM patch provides infrastructure that allows the clang side to
implement the missing pieces of the ELFv2 ABI relating to aggregates passed
by value.  Specifically, we need to:
- pass (and return) "homogeneous" floating-point or vector aggregates in
FPRs and VRs (this is similar to the ARM homogeneous aggregate ABI)
- return aggregates of up to 16 bytes in one or two GPRs
- pass aggregates that fit fully in registers without using the "byval"
mechanism (see discussion of the diff-llvm-elfv2-stack)

As infrastructure to enable those changes, this LLVM patch adds support for
passing array types directly.  These can be used by the front-end to pass
aggregate types (coerced to an appropriate array type).  The details of the
array type being used inform the back-end about ABI-relevant properties.
Specifically, the array element type encodes:
- whether the parameter should be passed in FPRs, VRs, or just GPRs/stack
slots  (for float / vector / integer element types, respectively)
- what the alignment requirements of the parameter are when passed in
GPRs/stack slots  (8 for float / 16 for vector / the element type size for
integer element types) -- this corresponds to the "byval align" field

The following patch uses the functionArgumentNeedsConsecutiveRegisters
callback to encode that special treatment is required for all
directly-passed array types.  The isInConsecutiveRegs /
isInConsecutiveRegsLast bits set as a results are then used to implement
the required size and alignment rules in CalculateStackSlotSize /
CalculateStackSlotAlignment etc.

As a related change, the ABI routines have to be modified to support
passing floating-point types in GPRs.  This is necessary because with
homogeneous aggregates of 4-byte float type we can now run out of FPRs
*before* we run out of the 64-byte argument save area that is shadowed by
GPRs.  Any extra floating-point arguments that no longer fit in FPRs must
now be passed in GPRs until we run out of those too.  Note that there was
already code to pass floating-point arguments in GPRs used with vararg
parameters, which was done by writing the argument out to the argument save
area first and then reloading into GPRs.  The patch re-implements this,
however, in favor of code packing float arguments directly via
extension/truncation, BITCAST, and BUILD_PAIR operations.  This has some
advantages:
- we no longer rely on the argument save area being present
- while the BITCASTs will currently often also result in values being
written to the stack and then reloaded, this should improve once we
implement the Power8 GPR<->FPR move instructions.

The final part of the patch enables up to 8 FPRs and VRs for argument
return in PPCCallingConv.td; this is required to support returning ELFv2
homogeneous aggregates.  (Note that this doesn't affect other ABIs since
LLVM wil only look for which register to use if the parameter is marked as
"direct" return anyway.)

(See attached file: diff-llvm-elfv2-aggregates)

- ELFv2 dynamic loader support

This is the final piece of ELFv2 support in LLVM: it enables the new ABI in
the runtime dynamic loader.  The loader has to implement the following
features:
- In the ELFv2 ABI, do not look up a function descriptor in .opd, but
instead use the local entry point when resolving a direct call.
- Update the TOC restore code to use the new TOC slot linkage area offset.
- Create PLT stubs appropriate for the ELFv2 ABI.

Note that this patch also adds common-code changes. These are necessary
because the loader must check the newly added ELF flags: the e_flags header
bits encoding the ABI version, and the st_other symbol table entry bits
encoding the local entry point offset.  There is currently no way to access
these, so I've added ObjectFile::getPlatformFlags and SymbolRef::getOther
accessors.

(See attached file: diff-llvm-elfv2-dyld)

I'd appreciate any review of the patch series!   I'm aware it's a lot of
code, but I'd really like to see clang/LLVM usable out-of-the-box on
powercp64le-linux soon (hopefully even in 3.5)!

Mit freundlichen Gruessen / Best Regards

Ulrich Weigand

--
  Dr. Ulrich Weigand | Phone: +49-7031/16-3727
  STSM, GNU/Linux compilers and toolchain
  IBM Deutschland Research & Development GmbH
  Vorsitzende des Aufsichtsrats: Martina Koederitz | Geschäftsführung: Dirk
Wittkopp
  Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
-------------- next part --------------
A non-text attachment was scrubbed...
Name: diff-llvm-elfv2-abiversion
Type: application/octet-stream
Size: 5627 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140714/4ffb227d/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: diff-llvm-elfv2-localentry
Type: application/octet-stream
Size: 12268 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140714/4ffb227d/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: diff-llvm-elfv2-funcdesc
Type: application/octet-stream
Size: 9970 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140714/4ffb227d/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: diff-llvm-elfv2-stack
Type: application/octet-stream
Size: 13604 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140714/4ffb227d/attachment-0003.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: diff-llvm-elfv2-crsave
Type: application/octet-stream
Size: 2455 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140714/4ffb227d/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: diff-llvm-elfv2-aggregates
Type: application/octet-stream
Size: 22934 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140714/4ffb227d/attachment-0005.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: diff-llvm-elfv2-dyld
Type: application/octet-stream
Size: 10685 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140714/4ffb227d/attachment-0006.obj>