[PATCH] PowerPC support for the ELFv2 ABI (powerpc64le-linux)

Sat Jul 19 20:11:54 PDT 2014

----- Original Message -----
> From: "Ulrich Weigand" <Ulrich.Weigand at de.ibm.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "LLVM Commits" <llvm-commits at cs.uiuc.edu>
> Sent: Saturday, July 19, 2014 7:11:39 PM
> Subject: Re: [PATCH] PowerPC support for the ELFv2 ABI (powerpc64le-linux)
> 
> Hal Finkel <hfinkel at anl.gov> wrote on 19.07.2014 00:02:08:
> 
> > I've commented on each of the patches below...
> 
> Hi Hal, thanks for the review!
> 
> I'm still working on addressing your comments, but just a quick reply
> on
> one question:
> 
> > +      // then the parameter save area.  For now, put all arguments
> > to
> vararg
> > +      // routines always in both locations (FPR *and* GPR or stack
> slot).
> > +      bool NeedGPROrStack = isVarArg || FPR_idx == NumFPRs;
> >
> > What does "For now" mean? Does this not make it even more expensive
> > to call a vararg routine, even under ELFv1? If I'm correct about it
> > making things more expensive, please don't make this change for
> > ELFv1.
> 
> "For now" is simply copied from the existing comment for vector args,
> and it means that we are not distinguishing between named and unnamed
> arguments of a vararg function, and bascially treat any call to
> vararg
> as if it were a call to an unprototyped function.
> 
> This is not optimal, but it is the same behavior as before my patch.
> 
> > (I certainly understand that on the P8 this will be better, but as
> > I'm sure you know, for older architectures the only way to do this
> > FRP -> GPR transfer is via memory which is not fast -- and
> > transferring via GPR means two memory load/store trips instead of
> > one).
> 
> Well, before the patch, we would
> - load Arg into the FPR
> - store Arg into the stack slot
> - load the GPR from the stack slot
> 
> After the patch, we instead
> - load Arg into the FPR
> - load Arg into the GPR (which under the covers may do a store/load
>   to a temp slot)
> 
> I don't see where this would be slower than before; in fact, if Arg
> itself happens to be a memory load already, I'd expect the new way
> to be faster even on pre-P8 and ELFv1 ...

Ah, okay thanks. I agree, that sounds fine. Please make sure this is noted in the commit message.

> 
> > Generally speaking, I suppose that Clang will only use array types
> > for ELFv2, but Clang is not the only possible frontend, and I'm
> > concerned that a lot of these changes don't have any kind of ABI
> > version check (especially for things that are more expensive on
> > older architectures for ELFv2 than for ELFv1).
> 
> I don't think there is anything that's been made more expensive
> on older architectures; at least that was the intent ...
> 
> In any case, I thought it preferable to *not* have ELFv2 checks
> in LLVM for this, but instead have the ABI as defined on the
> LLVM IR level to be the same for ELFv1 and ELFv2, which moves
> that choice to the front-end (which is in the better position to
> decide e.g. what types are homogeneous aggregates).

If the behavior is equivalent for the old IR-level constructs, then that's fine. I had thought it was changing (you were introducing extra GPR loads), but had just misread your code. Thanks for clarifying :-)

> 
> So the back-end at this point isn't making any difference between
> ELFv2 and ELFv1, but rather between float array types and integer
> array types (or byval struct types).
> 
> I guess I'm not sure I understand your specific concern: right now,
> if a front-end were to create IR that passes a float array type by
> value, LLVM common code would simply scalarize it and the back-end
> would create code as if a series of scalar float arguments had been
> passed.  This would effectively skip 8 bytes on the stack for each
> of the floats, so the result wouldn't match the layout of an actual
> on-stack float array; this is really all my patch changes.
> 
> I'm not sure the old behavior was particularly useful (and I would
> expect it never was actually used), but if one really wanted to
> preserve it, we could modify
> functionArgumentNeedsConsecutiveRegisters
> to only return true for float arrays on ELFv2 ...
> 
> > In addition, the comments don't do a good job of explaining what
> > parts of this behavior are ELFv2 and which apply to both versions;
> > this definitely needs to be improved.
> 
> Again, this is because I deliberately made the LLVM behavior
> independent of ELFv1 vs. ELFv2; there is a comment in
> functionArgumentNeedsConsecutiveRegisters that explains the
> intended use of those array types by a front-end to implement
> the ELFv2 calling convention.

Fair enough. However, I'd appreciate some comments in PPCISelLowering too. Saying "See the comment in functionArgumentNeedsConsecutiveRegisters", is fine, but otherwise it is not obvious which parts of the code are specifically handling ELFv2 features, and which are generic.

Thanks again,
Hal

> 
> Bye,
> Ulrich
> 
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory