[PATCH] PowerPC support for the ELFv2 ABI (powerpc64le-linux)
Ulrich.Weigand at de.ibm.com
Sat Jul 19 17:11:39 PDT 2014
Hal Finkel <hfinkel at anl.gov> wrote on 19.07.2014 00:02:08:
> I've commented on each of the patches below...
Hi Hal, thanks for the review!
I'm still working on addressing your comments, but just a quick reply on
> + // then the parameter save area. For now, put all arguments to
> + // routines always in both locations (FPR *and* GPR or stack
> + bool NeedGPROrStack = isVarArg || FPR_idx == NumFPRs;
> What does "For now" mean? Does this not make it even more expensive
> to call a vararg routine, even under ELFv1? If I'm correct about it
> making things more expensive, please don't make this change for ELFv1.
"For now" is simply copied from the existing comment for vector args,
and it means that we are not distinguishing between named and unnamed
arguments of a vararg function, and bascially treat any call to vararg
as if it were a call to an unprototyped function.
This is not optimal, but it is the same behavior as before my patch.
> (I certainly understand that on the P8 this will be better, but as
> I'm sure you know, for older architectures the only way to do this
> FRP -> GPR transfer is via memory which is not fast -- and
> transferring via GPR means two memory load/store trips instead of one).
Well, before the patch, we would
- load Arg into the FPR
- store Arg into the stack slot
- load the GPR from the stack slot
After the patch, we instead
- load Arg into the FPR
- load Arg into the GPR (which under the covers may do a store/load
to a temp slot)
I don't see where this would be slower than before; in fact, if Arg
itself happens to be a memory load already, I'd expect the new way
to be faster even on pre-P8 and ELFv1 ...
> Generally speaking, I suppose that Clang will only use array types
> for ELFv2, but Clang is not the only possible frontend, and I'm
> concerned that a lot of these changes don't have any kind of ABI
> version check (especially for things that are more expensive on
> older architectures for ELFv2 than for ELFv1).
I don't think there is anything that's been made more expensive
on older architectures; at least that was the intent ...
In any case, I thought it preferable to *not* have ELFv2 checks
in LLVM for this, but instead have the ABI as defined on the
LLVM IR level to be the same for ELFv1 and ELFv2, which moves
that choice to the front-end (which is in the better position to
decide e.g. what types are homogeneous aggregates).
So the back-end at this point isn't making any difference between
ELFv2 and ELFv1, but rather between float array types and integer
array types (or byval struct types).
I guess I'm not sure I understand your specific concern: right now,
if a front-end were to create IR that passes a float array type by
value, LLVM common code would simply scalarize it and the back-end
would create code as if a series of scalar float arguments had been
passed. This would effectively skip 8 bytes on the stack for each
of the floats, so the result wouldn't match the layout of an actual
on-stack float array; this is really all my patch changes.
I'm not sure the old behavior was particularly useful (and I would
expect it never was actually used), but if one really wanted to
preserve it, we could modify functionArgumentNeedsConsecutiveRegisters
to only return true for float arrays on ELFv2 ...
> In addition, the comments don't do a good job of explaining what
> parts of this behavior are ELFv2 and which apply to both versions;
> this definitely needs to be improved.
Again, this is because I deliberately made the LLVM behavior
independent of ELFv1 vs. ELFv2; there is a comment in
functionArgumentNeedsConsecutiveRegisters that explains the
intended use of those array types by a front-end to implement
the ELFv2 calling convention.
More information about the llvm-commits