[llvm] r190636 - Fix PPC ABI for ByVal structs with vector members

Fri Feb 27 16:00:13 PST 2015

[+Rafael, Nick, and Reid because we were talking about byval in another
thread]

On Fri, Feb 27, 2015 at 3:59 PM, David Blaikie <dblaikie at gmail.com> wrote:

>
>
> On Fri, Feb 27, 2015 at 3:41 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>
>> ----- Original Message -----
>> > From: "David Blaikie" <dblaikie at gmail.com>
>> > To: "Hal Finkel" <hfinkel at anl.gov>
>> > Cc: llvm-commits at cs.uiuc.edu
>> > Sent: Friday, February 27, 2015 3:35:32 PM
>> > Subject: Re: [llvm] r190636 - Fix PPC ABI for ByVal structs with vector
>> members
>> >
>> > On Fri, Feb 27, 2015 at 1:31 PM, Hal Finkel < hfinkel at anl.gov >
>> > wrote:
>> >
>> >
>> > ----- Original Message -----
>> > > From: "David Blaikie" < dblaikie at gmail.com >
>> > > To: "Hal Finkel" < hfinkel at anl.gov >
>> > > Cc: llvm-commits at cs.uiuc.edu
>> >
>> >
>> > > Sent: Friday, February 27, 2015 3:29:20 PM
>> > > Subject: Re: [llvm] r190636 - Fix PPC ABI for ByVal structs with
>> > > vector members
>> > >
>> > >
>> > > On Thu, Feb 19, 2015 at 4:24 PM, Hal Finkel < hfinkel at anl.gov >
>> > > wrote:
>> > >
>> > > ----- Original Message -----
>> > > > From: "David Blaikie" < dblaikie at gmail.com >
>> > > > To: "Hal Finkel" < hfinkel at anl.gov >
>> > > > Cc: llvm-commits at cs.uiuc.edu
>> > > > Sent: Thursday, February 19, 2015 3:57:13 PM
>> > > > Subject: Re: [llvm] r190636 - Fix PPC ABI for ByVal structs with
>> > > > vector members
>> > > >
>> > > >
>> > > > On Thu, Sep 12, 2013 at 4:20 PM, Hal Finkel < hfinkel at anl.gov >
>> > > > wrote:
>> > > >
>> > > >
>> > > > Author: hfinkel
>> > > > Date: Thu Sep 12 18:20:06 2013
>> > > > New Revision: 190636
>> > > >
>> > > > URL: http://llvm.org/viewvc/llvm-project?rev=190636&view=rev
>> > > > Log:
>> > > > Fix PPC ABI for ByVal structs with vector members
>> > > >
>> > > > When a structure is passed by value, and that structure contains
>> > > > a
>> > > > vector
>> > > > member, according to the PPC ABI, the structure will receive
>> > > > enhanced
>> > > > alignment
>> > > > (so that the vector within the structure will always be aligned).
>> > > >
>> > > > This should resolve PR16641.
>> > > >
>> > > >
>> > > > In an effort to simplify byval in a future world of opaque
>> > > > (untyped)
>> > > > pointers, I was considering changing "%struct.foo byval" to "ptr
>> > > > byval(num_bytes)" - which I think would work for every target
>> > > > except
>> > > > PPC (PPC, and the code in this commit, is the only case I could
>> > > > find
>> > > > of overriding getByValTypeAlignment that wasn't just the default
>> > > > of
>> > > > asking DataLayout for the type's alignment).
>> > > >
>> > > > Is this code necessary? If I remove the entire PPCTargetLowering
>> > > > override of getByValTypeAlignment this test case (& everything
>> > > > else
>> > > > in check-llvm) appears to still pass.
>> > > >
>> > >
>> > > Interesting. I'll need to look at this in more detail. I'm fairly
>> > > certain that it was needed when it was added. It may also still be
>> > > necessary for QPX support (which is incomplete in tree at the
>> > > moment, but that should change in the next few days). In short,
>> > > please give me a few days, and then I should have a better idea.
>> > > Theoretically, however, I'm pretty sure it is necessary.
>> > >
>> > >
>> > > Ping - any further thoughts on this? Does Clang already put the
>> > > necessary alignment on byval attributes for PPC? Do we need to fix
>> > > Clang to do so? (once we have this in the frontend, if possible,
>> > > then I'll set about changing byval to just take a number of bytes
>> > > to
>> > > copy & ignore the type entirely - except when upgrading old
>> > > bitcode)
>> > >
>> > >
>> >
>> > I think that Clang is okay; I'll confirm in a couple of hours.
>> >
>> >
>> > Might be easiest to just assert that the result from the
>> > target-specific handler produces the same as the default handler
>> > (which just consults the alignment of the type) - then do a
>> > bootstrap or whatever build would exercise this code well. (Since I
>> > don't have any PPC setup I'm not sure I can do this myself easily -
>> > but happy to take instructions if there's something I can do here)
>>
>> Sure. Honestly, I don't have a great way to test this either (because I
>> run only ppc64/Linux), but I'll demonstrate the issue...
>>
>> Consider this file:
>> $ cat /tmp/tv.c
>> typedef float __attribute__((ext_vector_type(4))) vfloat;
>> struct vfa {
>>   vfloat a[3];
>> };
>> struct vfs {
>>   vfloat x, y, z;
>> };
>>
>> void fooa(vfa t);
>> void foos(vfs t);
>>
>> void bara(vfa t) {
>>   fooa(t);
>> }
>>
>> void bars(vfs t) {
>>   foos(t);
>> }
>>
>> void test1() {
>>   vfa t = { (vfloat) { 0.0, 0.0, 0.0, 0.0 },
>>             (vfloat) { 1.0, 1.0, 1.0, 1.0 },
>>             (vfloat) { 2.0, 2.0, 2.0, 2.0 } };
>>   fooa(t);
>> }
>>
>> For several reasons, the powerpc64 ABI, non-Darwin, passes these 'byval'
>> structures, not as structures, but as casted arrays:
>>
>> clang++ -target powerpc64-unknown-linux-gnu -O3 -S -emit-llvm -o -
>> /tmp/tv.c
>>
>> target datalayout = "E-m:e-i64:64-n32:64"
>> target triple = "powerpc64-unknown-linux-gnu"
>>
>> define void @_Z4bara3vfa([3 x i128] %t.coerce) #0 {
>> entry:
>>   tail call void @_Z4fooa3vfa([3 x i128] %t.coerce)
>>   ret void
>> }
>>
>> declare void @_Z4fooa3vfa([3 x i128]) #0
>>
>> define void @_Z4bars3vfs([3 x i128] %t.coerce) #0 {
>> entry:
>>   tail call void @_Z4foos3vfs([3 x i128] %t.coerce)
>>   ret void
>> }
>>
>> declare void @_Z4foos3vfs([3 x i128]) #0
>>
>> define void @_Z5test1v() #0 {
>> entry:
>>   tail call void @_Z4fooa3vfa([3 x i128] [i128 0, i128 bitcast (<4 x
>> float> <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float
>> 1.000000e+00> to i128), i128 bitcast (<4 x float> <float 2.000000e+00,
>> float 2.000000e+00, float 2.000000e+00, float 2.000000e+00> to i128)])
>>   ret void
>> }
>>
>> and so we always get the alignment correct, because it is the alignment
>> if 'i128' (which is 16). To digress slightly, I'm now somewhat concerned
>> that this works more-or-less by accident (because DataLayout only defaults
>> to giving natural alignment to vector types, for integers we normally just
>> take the largest integer alignment specified, which would be 8 in this
>> case, and for arrays we take the alignment of the element type) but all of
>> the temporaries have 16 byte alignment, and so it happens to do the right
>> thing (Uli implemented this, so I'm cc'ing him here).
>>
>> For clarity, the first function, prior to optimization, looked like this:
>>
>> define void @_Z4bara3vfa([3 x i128] %t.coerce) #0 {
>> entry:
>>   %t = alloca %struct.vfa, align 16
>>   %agg.tmp = alloca %struct.vfa, align 16
>>   %coerce.dive = getelementptr %struct.vfa, %struct.vfa* %t, i32 0, i32 0
>>   %0 = bitcast [3 x <4 x float>]* %coerce.dive to [3 x i128]*
>>   store [3 x i128] %t.coerce, [3 x i128]* %0, align 1
>>   %1 = bitcast %struct.vfa* %agg.tmp to i8*
>>   %2 = bitcast %struct.vfa* %t to i8*
>>   call void @llvm.memcpy.p0i8.p0i8.i64(i8* %1, i8* %2, i64 48, i32 16, i1
>> false), !tbaa.struct !1
>>   %coerce.dive1 = getelementptr %struct.vfa, %struct.vfa* %agg.tmp, i32
>> 0, i32 0
>>   %3 = bitcast [3 x <4 x float>]* %coerce.dive1 to [3 x i128]*
>>   %4 = load [3 x i128]* %3, align 1
>>   call void @_Z4fooa3vfa([3 x i128] %4)
>>   ret void
>> }
>>
>> In any case, for non-powerpc64/Linux systems, we don't use this array
>> coercion, and we end up with something like this:
>>
>> clang++ -target powerpc-unknown-linux-gnu -O3 -S -o - -emit-llvm /tmp/tv.c
>>
>> target datalayout = "E-m:e-p:32:32-i64:64-n32"
>> target triple = "powerpc-unknown-linux-gnu"
>>
>> %struct.vfa = type { [3 x <4 x float>] }
>> %struct.vfs = type { <4 x float>, <4 x float>, <4 x float> }
>>
>> @_ZZ5test1vE1t = private unnamed_addr constant %struct.vfa { [3 x <4 x
>> float>] [<4 x float> zeroinitializer, <4 x float> <float 1.000000e+00,
>> float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>, <4 x float>
>> <float 2.000000e+00, float 2.000000e+00, float 2.000000e+00, float
>> 2.000000e+00>] }, align 16
>>
>> define void @_Z4bara3vfa(%struct.vfa* byval nocapture readonly %t) #0 {
>> entry:
>>   %agg.tmp = alloca %struct.vfa, align 16
>>   %0 = bitcast %struct.vfa* %agg.tmp to i8*
>>   %1 = bitcast %struct.vfa* %t to i8*
>>   call void @llvm.memcpy.p0i8.p0i8.i32(i8* %0, i8* %1, i32 48, i32 16, i1
>> false), !tbaa.struct !1
>>   call void @_Z4fooa3vfa(%struct.vfa* byval %agg.tmp)
>>   ret void
>> }
>>
>> ...
>>
>> and, as you can see, we don't explicitly specify an alignment on the
>> byval argument (but nevertheless depend on it having 16-byte alignment --
>> as that is assumed by the alignment specified by the memcpy -- for
>> correctness).
>>
>> And, so, I'm not sure what the best plan is here. I'm certain that Clang
>> (or any other FE) *can* put the correct alignment on the byval arguments.
>> Furthermore, I'm fine with that solution.  We'd obviously need to update
>> Clang to do so (and I assume you'd be doing that anyway).
>
>
> Thanks for all the context!
>
> Yes, that's what I'm suggesting we/I could do - ensure Clang produces the
> right alignment attributes (I was, of course, rather hoping Clang /already/
> did this since I saw alignment attributes on other byval parameters - but
> as you've demonstrated, they're missing here) & then relegate the alignment
> handling stuff in LLVM to the bitcode reader for backwards compatibility.
>
> (then, with that all done, I'd get to work adding an integer parameter to
> byval for the number of bits to copy and then byval becomes pointee type
> independent and I'm another step along the way to typeless/opaque pointer
> types in LLVM... )
>
>
>> This does not currently affect powerpc64/Linux directly, so I don't have
>> a great mechanism for testing, but we can certainly make sure that the
>> outputs seem reasonable (and use code like this to construct some
>> reasonable regression tests).
>>
>
> Given this is a bit out of my field, I'll certainly send changes to you
> for review before committing - any test cases now or in code review will be
> really appreciated.
>
> Thanks!
>
> - David
>
>
>>
>> Thanks again,
>> Hal
>>
>> >
>> > - David
>> >
>> >
>> >
>> > -Hal
>> >
>> >
>> >
>> > >
>> > > -Hal
>> > >
>> > >
>> > >
>> > > >
>> > > >
>> > > > Added:
>> > > > llvm/trunk/test/CodeGen/PowerPC/vec-abi-align.ll
>> > > > Modified:
>> > > > llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp
>> > > >
>> > > > Modified: llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp
>> > > > URL:
>> > > >
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp?rev=190636&r1=190635&r2=190636&view=diff
>> > > >
>> ==============================================================================
>> > > > --- llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp (original)
>> > > > +++ llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp Thu Sep 12
>> > > > 18:20:06 2013
>> > > > @@ -578,24 +578,48 @@ PPCTargetLowering::PPCTargetLowering(PPC
>> > > > }
>> > > > }
>> > > >
>> > > > +/// getMaxByValAlign - Helper for getByValTypeAlignment to
>> > > > determine
>> > > > +/// the desired ByVal argument alignment.
>> > > > +static void getMaxByValAlign(Type *Ty, unsigned &MaxAlign,
>> > > > + unsigned MaxMaxAlign) {
>> > > > + if (MaxAlign == MaxMaxAlign)
>> > > > + return;
>> > > > + if (VectorType *VTy = dyn_cast<VectorType>(Ty)) {
>> > > > + if (MaxMaxAlign >= 32 && VTy->getBitWidth() >= 256)
>> > > > + MaxAlign = 32;
>> > > > + else if (VTy->getBitWidth() >= 128 && MaxAlign < 16)
>> > > > + MaxAlign = 16;
>> > > > + } else if (ArrayType *ATy = dyn_cast<ArrayType>(Ty)) {
>> > > > + unsigned EltAlign = 0;
>> > > > + getMaxByValAlign(ATy->getElementType(), EltAlign, MaxMaxAlign);
>> > > > + if (EltAlign > MaxAlign)
>> > > > + MaxAlign = EltAlign;
>> > > > + } else if (StructType *STy = dyn_cast<StructType>(Ty)) {
>> > > > + for (unsigned i = 0, e = STy->getNumElements(); i != e; ++i) {
>> > > > + unsigned EltAlign = 0;
>> > > > + getMaxByValAlign(STy->getElementType(i), EltAlign,
>> > > > MaxMaxAlign);
>> > > > + if (EltAlign > MaxAlign)
>> > > > + MaxAlign = EltAlign;
>> > > > + if (MaxAlign == MaxMaxAlign)
>> > > > + break;
>> > > > + }
>> > > > + }
>> > > > +}
>> > > > +
>> > > > /// getByValTypeAlignment - Return the desired alignment for
>> > > > ByVal
>> > > > aggregate
>> > > > /// function arguments in the caller parameter area.
>> > > > unsigned PPCTargetLowering::getByValTypeAlignment(Type *Ty) const
>> > > > {
>> > > > const TargetMachine &TM = getTargetMachine();
>> > > > // Darwin passes everything on 4 byte boundary.
>> > > > - if (TM.getSubtarget<PPCSubtarget>().isDarwin())
>> > > > + if (PPCSubTarget.isDarwin())
>> > > > return 4;
>> > > >
>> > > > // 16byte and wider vectors are passed on 16byte boundary.
>> > > > - if (VectorType *VTy = dyn_cast<VectorType>(Ty))
>> > > > - if (VTy->getBitWidth() >= 128)
>> > > > - return 16;
>> > > > -
>> > > > // The rest is 8 on PPC64 and 4 on PPC32 boundary.
>> > > > - if (PPCSubTarget.isPPC64())
>> > > > - return 8;
>> > > > -
>> > > > - return 4;
>> > > > + unsigned Align = PPCSubTarget.isPPC64() ? 8 : 4;
>> > > > + if (PPCSubTarget.hasAltivec() || PPCSubTarget.hasQPX())
>> > > > + getMaxByValAlign(Ty, Align, PPCSubTarget.hasQPX() ? 32 : 16);
>> > > > + return Align;
>> > > > }
>> > > >
>> > > > const char *PPCTargetLowering::getTargetNodeName(unsigned Opcode)
>> > > > const {
>> > > > @@ -2281,6 +2305,13 @@ PPCTargetLowering::LowerFormalArguments_
>> > > > InVals.push_back(FIN);
>> > > > continue;
>> > > > }
>> > > > +
>> > > > + unsigned BVAlign = Flags.getByValAlign();
>> > > > + if (BVAlign > 8) {
>> > > > + ArgOffset = ((ArgOffset+BVAlign-1)/BVAlign)*BVAlign;
>> > > > + CurArgOffset = ArgOffset;
>> > > > + }
>> > > > +
>> > > > // All aggregates smaller than 8 bytes must be passed
>> > > > right-justified.
>> > > > if (ObjSize < PtrByteSize)
>> > > > CurArgOffset = CurArgOffset + (PtrByteSize - ObjSize);
>> > > > @@ -3870,6 +3901,15 @@ PPCTargetLowering::LowerCall_64SVR4(SDVa
>> > > > if (Size == 0)
>> > > > continue;
>> > > >
>> > > > + unsigned BVAlign = Flags.getByValAlign();
>> > > > + if (BVAlign > 8) {
>> > > > + if (BVAlign % PtrByteSize != 0)
>> > > > + llvm_unreachable(
>> > > > + "ByVal alignment is not a multiple of the pointer size");
>> > > > +
>> > > > + ArgOffset = ((ArgOffset+BVAlign-1)/BVAlign)*BVAlign;
>> > > > + }
>> > > > +
>> > > > // All aggregates smaller than 8 bytes must be passed
>> > > > right-justified.
>> > > > if (Size==1 || Size==2 || Size==4) {
>> > > > EVT VT = (Size==1) ? MVT::i8 : ((Size==2) ? MVT::i16 : MVT::i32);
>> > > >
>> > > > Added: llvm/trunk/test/CodeGen/PowerPC/vec-abi-align.ll
>> > > > URL:
>> > > >
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/PowerPC/vec-abi-align.ll?rev=190636&view=auto
>> > > >
>> ==============================================================================
>> > > > --- llvm/trunk/test/CodeGen/PowerPC/vec-abi-align.ll (added)
>> > > > +++ llvm/trunk/test/CodeGen/PowerPC/vec-abi-align.ll Thu Sep 12
>> > > > 18:20:06 2013
>> > > > @@ -0,0 +1,64 @@
>> > > > +; RUN: llc -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 < %s
>> > > > |
>> > > > FileCheck %s
>> > > > +target datalayout =
>> > > >
>> "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f128:128:128-v128:128:128-n32:64"
>> > > > +target triple = "powerpc64-unknown-linux-gnu"
>> > > > +
>> > > > +%struct.s2 = type { i64, <4 x float> }
>> > > > +
>> > > > + at ve = external global <4 x float>
>> > > > + at n = external global i64
>> > > > +
>> > > > +; Function Attrs: nounwind
>> > > > +define void @test1(i64 %d1, i64 %d2, i64 %d3, i64 %d4, i64 %d5,
>> > > > i64
>> > > > %d6, i64 %d7, i64 %d8, i64 %d9, <4 x float> inreg %vs.coerce) #0
>> > > > {
>> > > > +entry:
>> > > > + store <4 x float> %vs.coerce, <4 x float>* @ve, align 16, !tbaa
>> > > > !0
>> > > > + ret void
>> > > > +
>> > > > +; CHECK-LABEL: @test1
>> > > > +; CHECK: stvx 2,
>> > > > +; CHECK: blr
>> > > > +}
>> > > > +
>> > > > +; Function Attrs: nounwind
>> > > > +define void @test2(i64 %d1, i64 %d2, i64 %d3, i64 %d4, i64 %d5,
>> > > > i64
>> > > > %d6, i64 %d7, i64 %d8, %struct.s2* byval nocapture readonly %vs)
>> > > > #0
>> > > > {
>> > > > +entry:
>> > > > + %m = getelementptr inbounds %struct.s2* %vs, i64 0, i32 0
>> > > > + %0 = load i64* %m, align 8, !tbaa !2
>> > > > + store i64 %0, i64* @n, align 8, !tbaa !2
>> > > > + %v = getelementptr inbounds %struct.s2* %vs, i64 0, i32 1
>> > > > + %1 = load <4 x float>* %v, align 16, !tbaa !0
>> > > > + store <4 x float> %1, <4 x float>* @ve, align 16, !tbaa !0
>> > > > + ret void
>> > > > +
>> > > > +; CHECK-LABEL: @test2
>> > > > +; CHECK: ld {{[0-9]+}}, 112(1)
>> > > > +; CHECK: li [[REG16:[0-9]+]], 16
>> > > > +; CHECK: addi [[REGB:[0-9]+]], 1, 112
>> > > > +; CHECK: lvx 2, [[REGB]], [[REG16]]
>> > > > +; CHECK: blr
>> > > > +}
>> > > > +
>> > > > +; Function Attrs: nounwind
>> > > > +define void @test3(i64 %d1, i64 %d2, i64 %d3, i64 %d4, i64 %d5,
>> > > > i64
>> > > > %d6, i64 %d7, i64 %d8, i64 %d9, %struct.s2* byval nocapture
>> > > > readonly
>> > > > %vs) #0 {
>> > > > +entry:
>> > > > + %m = getelementptr inbounds %struct.s2* %vs, i64 0, i32 0
>> > > > + %0 = load i64* %m, align 8, !tbaa !2
>> > > > + store i64 %0, i64* @n, align 8, !tbaa !2
>> > > > + %v = getelementptr inbounds %struct.s2* %vs, i64 0, i32 1
>> > > > + %1 = load <4 x float>* %v, align 16, !tbaa !0
>> > > > + store <4 x float> %1, <4 x float>* @ve, align 16, !tbaa !0
>> > > > + ret void
>> > > > +
>> > > > +; CHECK-LABEL: @test3
>> > > > +; CHECK: ld {{[0-9]+}}, 128(1)
>> > > > +; CHECK: li [[REG16:[0-9]+]], 16
>> > > > +; CHECK: addi [[REGB:[0-9]+]], 1, 128
>> > > > +; CHECK: lvx 2, [[REGB]], [[REG16]]
>> > > > +; CHECK: blr
>> > > > +}
>> > > > +
>> > > > +attributes #0 = { nounwind }
>> > > > +
>> > > > +!0 = metadata !{metadata !"omnipotent char", metadata !1}
>> > > > +!1 = metadata !{metadata !"Simple C/C++ TBAA"}
>> > > > +!2 = metadata !{metadata !"long", metadata !0}
>> > > > +
>> > > >
>> > > >
>> > > > _______________________________________________
>> > > > llvm-commits mailing list
>> > > > llvm-commits at cs.uiuc.edu
>> > > > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>> > > >
>> > > >
>> > >
>> > > --
>> > > Hal Finkel
>> > > Assistant Computational Scientist
>> > > Leadership Computing Facility
>> > > Argonne National Laboratory
>> > >
>> > >
>> >
>> > --
>> > Hal Finkel
>> > Assistant Computational Scientist
>> > Leadership Computing Facility
>> > Argonne National Laboratory
>> >
>> >
>>
>> --
>> Hal Finkel
>> Assistant Computational Scientist
>> Leadership Computing Facility
>> Argonne National Laboratory
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150227/9a225a02/attachment.html>