[llvm] r190636 - Fix PPC ABI for ByVal structs with vector members

Fri Feb 27 16:05:04 PST 2015

----- Original Message -----
> From: "David Blaikie" <dblaikie at gmail.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: llvm-commits at cs.uiuc.edu, "Ulrich Weigand" <Ulrich.Weigand at de.ibm.com>
> Sent: Friday, February 27, 2015 5:59:22 PM
> Subject: Re: [llvm] r190636 - Fix PPC ABI for ByVal structs with vector members
> 
> On Fri, Feb 27, 2015 at 3:41 PM, Hal Finkel < hfinkel at anl.gov >
> wrote:
> 
> 
> ----- Original Message -----
> > From: "David Blaikie" < dblaikie at gmail.com >
> > To: "Hal Finkel" < hfinkel at anl.gov >
> > Cc: llvm-commits at cs.uiuc.edu
> 
> 
> > Sent: Friday, February 27, 2015 3:35:32 PM
> > Subject: Re: [llvm] r190636 - Fix PPC ABI for ByVal structs with
> > vector members
> > 
> > On Fri, Feb 27, 2015 at 1:31 PM, Hal Finkel < hfinkel at anl.gov >
> > wrote:
> > 
> > 
> > ----- Original Message -----
> > > From: "David Blaikie" < dblaikie at gmail.com >
> > > To: "Hal Finkel" < hfinkel at anl.gov >
> > > Cc: llvm-commits at cs.uiuc.edu
> > 
> > 
> > > Sent: Friday, February 27, 2015 3:29:20 PM
> > > Subject: Re: [llvm] r190636 - Fix PPC ABI for ByVal structs with
> > > vector members
> > > 
> > > 
> > > On Thu, Feb 19, 2015 at 4:24 PM, Hal Finkel < hfinkel at anl.gov >
> > > wrote:
> > > 
> > > ----- Original Message -----
> > > > From: "David Blaikie" < dblaikie at gmail.com >
> > > > To: "Hal Finkel" < hfinkel at anl.gov >
> > > > Cc: llvm-commits at cs.uiuc.edu
> > > > Sent: Thursday, February 19, 2015 3:57:13 PM
> > > > Subject: Re: [llvm] r190636 - Fix PPC ABI for ByVal structs
> > > > with
> > > > vector members
> > > > 
> > > > 
> > > > On Thu, Sep 12, 2013 at 4:20 PM, Hal Finkel < hfinkel at anl.gov >
> > > > wrote:
> > > > 
> > > > 
> > > > Author: hfinkel
> > > > Date: Thu Sep 12 18:20:06 2013
> > > > New Revision: 190636
> > > > 
> > > > URL: http://llvm.org/viewvc/llvm-project?rev=190636&view=rev
> > > > Log:
> > > > Fix PPC ABI for ByVal structs with vector members
> > > > 
> > > > When a structure is passed by value, and that structure
> > > > contains
> > > > a
> > > > vector
> > > > member, according to the PPC ABI, the structure will receive
> > > > enhanced
> > > > alignment
> > > > (so that the vector within the structure will always be
> > > > aligned).
> > > > 
> > > > This should resolve PR16641.
> > > > 
> > > > 
> > > > In an effort to simplify byval in a future world of opaque
> > > > (untyped)
> > > > pointers, I was considering changing "%struct.foo byval" to
> > > > "ptr
> > > > byval(num_bytes)" - which I think would work for every target
> > > > except
> > > > PPC (PPC, and the code in this commit, is the only case I could
> > > > find
> > > > of overriding getByValTypeAlignment that wasn't just the
> > > > default
> > > > of
> > > > asking DataLayout for the type's alignment).
> > > > 
> > > > Is this code necessary? If I remove the entire
> > > > PPCTargetLowering
> > > > override of getByValTypeAlignment this test case (& everything
> > > > else
> > > > in check-llvm) appears to still pass.
> > > > 
> > > 
> > > Interesting. I'll need to look at this in more detail. I'm fairly
> > > certain that it was needed when it was added. It may also still
> > > be
> > > necessary for QPX support (which is incomplete in tree at the
> > > moment, but that should change in the next few days). In short,
> > > please give me a few days, and then I should have a better idea.
> > > Theoretically, however, I'm pretty sure it is necessary.
> > > 
> > > 
> > > Ping - any further thoughts on this? Does Clang already put the
> > > necessary alignment on byval attributes for PPC? Do we need to
> > > fix
> > > Clang to do so? (once we have this in the frontend, if possible,
> > > then I'll set about changing byval to just take a number of bytes
> > > to
> > > copy & ignore the type entirely - except when upgrading old
> > > bitcode)
> > > 
> > > 
> > 
> > I think that Clang is okay; I'll confirm in a couple of hours.
> > 
> > 
> > Might be easiest to just assert that the result from the
> > target-specific handler produces the same as the default handler
> > (which just consults the alignment of the type) - then do a
> > bootstrap or whatever build would exercise this code well. (Since I
> > don't have any PPC setup I'm not sure I can do this myself easily -
> > but happy to take instructions if there's something I can do here)
> 
> Sure. Honestly, I don't have a great way to test this either (because
> I run only ppc64/Linux), but I'll demonstrate the issue...
> 
> Consider this file:
> $ cat /tmp/tv.c
> typedef float __attribute__((ext_vector_type(4))) vfloat;
> struct vfa {
> vfloat a[3];
> };
> struct vfs {
> vfloat x, y, z;
> };
> 
> void fooa(vfa t);
> void foos(vfs t);
> 
> void bara(vfa t) {
> fooa(t);
> }
> 
> void bars(vfs t) {
> foos(t);
> }
> 
> void test1() {
> vfa t = { (vfloat) { 0.0, 0.0, 0.0, 0.0 },
> (vfloat) { 1.0, 1.0, 1.0, 1.0 },
> (vfloat) { 2.0, 2.0, 2.0, 2.0 } };
> fooa(t);
> }
> 
> For several reasons, the powerpc64 ABI, non-Darwin, passes these
> 'byval' structures, not as structures, but as casted arrays:
> 
> clang++ -target powerpc64-unknown-linux-gnu -O3 -S -emit-llvm -o -
> /tmp/tv.c
> 
> target datalayout = "E-m:e-i64:64-n32:64"
> target triple = "powerpc64-unknown-linux-gnu"
> 
> define void @_Z4bara3vfa([3 x i128] %t.coerce) #0 {
> entry:
> tail call void @_Z4fooa3vfa([3 x i128] %t.coerce)
> ret void
> }
> 
> declare void @_Z4fooa3vfa([3 x i128]) #0
> 
> define void @_Z4bars3vfs([3 x i128] %t.coerce) #0 {
> entry:
> tail call void @_Z4foos3vfs([3 x i128] %t.coerce)
> ret void
> }
> 
> declare void @_Z4foos3vfs([3 x i128]) #0
> 
> define void @_Z5test1v() #0 {
> entry:
> tail call void @_Z4fooa3vfa([3 x i128] [i128 0, i128 bitcast (<4 x
> float> <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00,
> float 1.000000e+00> to i128), i128 bitcast (<4 x float> <float
> 2.000000e+00, float 2.000000e+00, float 2.000000e+00, float
> 2.000000e+00> to i128)])
> ret void
> }
> 
> and so we always get the alignment correct, because it is the
> alignment if 'i128' (which is 16). To digress slightly, I'm now
> somewhat concerned that this works more-or-less by accident (because
> DataLayout only defaults to giving natural alignment to vector
> types, for integers we normally just take the largest integer
> alignment specified, which would be 8 in this case, and for arrays
> we take the alignment of the element type) but all of the
> temporaries have 16 byte alignment, and so it happens to do the
> right thing (Uli implemented this, so I'm cc'ing him here).
> 
> For clarity, the first function, prior to optimization, looked like
> this:
> 
> define void @_Z4bara3vfa([3 x i128] %t.coerce) #0 {
> entry:
> %t = alloca %struct.vfa, align 16
> %agg.tmp = alloca %struct.vfa, align 16
> %coerce.dive = getelementptr %struct.vfa, %struct.vfa* %t, i32 0, i32
> 0
> %0 = bitcast [3 x <4 x float>]* %coerce.dive to [3 x i128]*
> store [3 x i128] %t.coerce, [3 x i128]* %0, align 1
> %1 = bitcast %struct.vfa* %agg.tmp to i8*
> %2 = bitcast %struct.vfa* %t to i8*
> call void @llvm.memcpy.p0i8.p0i8.i64(i8* %1, i8* %2, i64 48, i32 16,
> i1 false), !tbaa.struct !1
> %coerce.dive1 = getelementptr %struct.vfa, %struct.vfa* %agg.tmp, i32
> 0, i32 0
> %3 = bitcast [3 x <4 x float>]* %coerce.dive1 to [3 x i128]*
> %4 = load [3 x i128]* %3, align 1
> call void @_Z4fooa3vfa([3 x i128] %4)
> ret void
> }
> 
> In any case, for non-powerpc64/Linux systems, we don't use this array
> coercion, and we end up with something like this:
> 
> clang++ -target powerpc-unknown-linux-gnu -O3 -S -o - -emit-llvm
> /tmp/tv.c
> 
> target datalayout = "E-m:e-p:32:32-i64:64-n32"
> target triple = "powerpc-unknown-linux-gnu"
> 
> %struct.vfa = type { [3 x <4 x float>] }
> %struct.vfs = type { <4 x float>, <4 x float>, <4 x float> }
> 
> @_ZZ5test1vE1t = private unnamed_addr constant %struct.vfa { [3 x <4
> x float>] [<4 x float> zeroinitializer, <4 x float> <float
> 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float
> 1.000000e+00>, <4 x float> <float 2.000000e+00, float 2.000000e+00,
> float 2.000000e+00, float 2.000000e+00>] }, align 16
> 
> define void @_Z4bara3vfa(%struct.vfa* byval nocapture readonly %t) #0
> {
> entry:
> %agg.tmp = alloca %struct.vfa, align 16
> %0 = bitcast %struct.vfa* %agg.tmp to i8*
> %1 = bitcast %struct.vfa* %t to i8*
> call void @llvm.memcpy.p0i8.p0i8.i32(i8* %0, i8* %1, i32 48, i32 16,
> i1 false), !tbaa.struct !1
> call void @_Z4fooa3vfa(%struct.vfa* byval %agg.tmp)
> ret void
> }
> 
> ...
> 
> and, as you can see, we don't explicitly specify an alignment on the
> byval argument (but nevertheless depend on it having 16-byte
> alignment -- as that is assumed by the alignment specified by the
> memcpy -- for correctness).
> 
> And, so, I'm not sure what the best plan is here. I'm certain that
> Clang (or any other FE) *can* put the correct alignment on the byval
> arguments. Furthermore, I'm fine with that solution. We'd obviously
> need to update Clang to do so (and I assume you'd be doing that
> anyway).
> 
> Thanks for all the context!
> 
> Yes, that's what I'm suggesting we/I could do - ensure Clang produces
> the right alignment attributes (I was, of course, rather hoping
> Clang /already/ did this since I saw alignment attributes on other
> byval parameters - but as you've demonstrated, they're missing here)
> & then relegate the alignment handling stuff in LLVM to the bitcode
> reader for backwards compatibility.
> 
> (then, with that all done, I'd get to work adding an integer
> parameter to byval for the number of bits to copy and then byval
> becomes pointee type independent and I'm another step along the way
> to typeless/opaque pointer types in LLVM... )
> 
> 
> This does not currently affect powerpc64/Linux directly, so I don't
> have a great mechanism for testing, but we can certainly make sure
> that the outputs seem reasonable (and use code like this to
> construct some reasonable regression tests).
> 
> 
> Given this is a bit out of my field, I'll certainly send changes to
> you for review before committing - any test cases now or in code
> review will be really appreciated.

Yes, I'd be happy to review/test.

Thanks for working on this!

 -Hal

> 
> Thanks!
> 
> - David
> 
> 
> 
> Thanks again,
> Hal
> 
> 
> 
> > 
> > - David
> > 
> > 
> > 
> > -Hal
> > 
> > 
> > 
> > > 
> > > -Hal
> > > 
> > > 
> > > 
> > > > 
> > > > 
> > > > Added:
> > > > llvm/trunk/test/CodeGen/PowerPC/vec-abi-align.ll
> > > > Modified:
> > > > llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp
> > > > 
> > > > Modified: llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp
> > > > URL:
> > > > http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp?rev=190636&r1=190635&r2=190636&view=diff
> > > > ==============================================================================
> > > > --- llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp
> > > > (original)
> > > > +++ llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp Thu Sep
> > > > 12
> > > > 18:20:06 2013
> > > > @@ -578,24 +578,48 @@ PPCTargetLowering::PPCTargetLowering(PPC
> > > > }
> > > > }
> > > > 
> > > > +/// getMaxByValAlign - Helper for getByValTypeAlignment to
> > > > determine
> > > > +/// the desired ByVal argument alignment.
> > > > +static void getMaxByValAlign(Type *Ty, unsigned &MaxAlign,
> > > > + unsigned MaxMaxAlign) {
> > > > + if (MaxAlign == MaxMaxAlign)
> > > > + return;
> > > > + if (VectorType *VTy = dyn_cast<VectorType>(Ty)) {
> > > > + if (MaxMaxAlign >= 32 && VTy->getBitWidth() >= 256)
> > > > + MaxAlign = 32;
> > > > + else if (VTy->getBitWidth() >= 128 && MaxAlign < 16)
> > > > + MaxAlign = 16;
> > > > + } else if (ArrayType *ATy = dyn_cast<ArrayType>(Ty)) {
> > > > + unsigned EltAlign = 0;
> > > > + getMaxByValAlign(ATy->getElementType(), EltAlign,
> > > > MaxMaxAlign);
> > > > + if (EltAlign > MaxAlign)
> > > > + MaxAlign = EltAlign;
> > > > + } else if (StructType *STy = dyn_cast<StructType>(Ty)) {
> > > > + for (unsigned i = 0, e = STy->getNumElements(); i != e; ++i)
> > > > {
> > > > + unsigned EltAlign = 0;
> > > > + getMaxByValAlign(STy->getElementType(i), EltAlign,
> > > > MaxMaxAlign);
> > > > + if (EltAlign > MaxAlign)
> > > > + MaxAlign = EltAlign;
> > > > + if (MaxAlign == MaxMaxAlign)
> > > > + break;
> > > > + }
> > > > + }
> > > > +}
> > > > +
> > > > /// getByValTypeAlignment - Return the desired alignment for
> > > > ByVal
> > > > aggregate
> > > > /// function arguments in the caller parameter area.
> > > > unsigned PPCTargetLowering::getByValTypeAlignment(Type *Ty)
> > > > const
> > > > {
> > > > const TargetMachine &TM = getTargetMachine();
> > > > // Darwin passes everything on 4 byte boundary.
> > > > - if (TM.getSubtarget<PPCSubtarget>().isDarwin())
> > > > + if (PPCSubTarget.isDarwin())
> > > > return 4;
> > > > 
> > > > // 16byte and wider vectors are passed on 16byte boundary.
> > > > - if (VectorType *VTy = dyn_cast<VectorType>(Ty))
> > > > - if (VTy->getBitWidth() >= 128)
> > > > - return 16;
> > > > -
> > > > // The rest is 8 on PPC64 and 4 on PPC32 boundary.
> > > > - if (PPCSubTarget.isPPC64())
> > > > - return 8;
> > > > -
> > > > - return 4;
> > > > + unsigned Align = PPCSubTarget.isPPC64() ? 8 : 4;
> > > > + if (PPCSubTarget.hasAltivec() || PPCSubTarget.hasQPX())
> > > > + getMaxByValAlign(Ty, Align, PPCSubTarget.hasQPX() ? 32 : 16);
> > > > + return Align;
> > > > }
> > > > 
> > > > const char *PPCTargetLowering::getTargetNodeName(unsigned
> > > > Opcode)
> > > > const {
> > > > @@ -2281,6 +2305,13 @@ PPCTargetLowering::LowerFormalArguments_
> > > > InVals.push_back(FIN);
> > > > continue;
> > > > }
> > > > +
> > > > + unsigned BVAlign = Flags.getByValAlign();
> > > > + if (BVAlign > 8) {
> > > > + ArgOffset = ((ArgOffset+BVAlign-1)/BVAlign)*BVAlign;
> > > > + CurArgOffset = ArgOffset;
> > > > + }
> > > > +
> > > > // All aggregates smaller than 8 bytes must be passed
> > > > right-justified.
> > > > if (ObjSize < PtrByteSize)
> > > > CurArgOffset = CurArgOffset + (PtrByteSize - ObjSize);
> > > > @@ -3870,6 +3901,15 @@ PPCTargetLowering::LowerCall_64SVR4(SDVa
> > > > if (Size == 0)
> > > > continue;
> > > > 
> > > > + unsigned BVAlign = Flags.getByValAlign();
> > > > + if (BVAlign > 8) {
> > > > + if (BVAlign % PtrByteSize != 0)
> > > > + llvm_unreachable(
> > > > + "ByVal alignment is not a multiple of the pointer size");
> > > > +
> > > > + ArgOffset = ((ArgOffset+BVAlign-1)/BVAlign)*BVAlign;
> > > > + }
> > > > +
> > > > // All aggregates smaller than 8 bytes must be passed
> > > > right-justified.
> > > > if (Size==1 || Size==2 || Size==4) {
> > > > EVT VT = (Size==1) ? MVT::i8 : ((Size==2) ? MVT::i16 :
> > > > MVT::i32);
> > > > 
> > > > Added: llvm/trunk/test/CodeGen/PowerPC/vec-abi-align.ll
> > > > URL:
> > > > http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/PowerPC/vec-abi-align.ll?rev=190636&view=auto
> > > > ==============================================================================
> > > > --- llvm/trunk/test/CodeGen/PowerPC/vec-abi-align.ll (added)
> > > > +++ llvm/trunk/test/CodeGen/PowerPC/vec-abi-align.ll Thu Sep 12
> > > > 18:20:06 2013
> > > > @@ -0,0 +1,64 @@
> > > > +; RUN: llc -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 <
> > > > %s
> > > > | 
> > > > FileCheck %s
> > > > +target datalayout =
> > > > "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f128:128:128-v128:128:128-n32:64"
> > > > +target triple = "powerpc64-unknown-linux-gnu"
> > > > +
> > > > +%struct.s2 = type { i64, <4 x float> }
> > > > +
> > > > + at ve = external global <4 x float>
> > > > + at n = external global i64
> > > > +
> > > > +; Function Attrs: nounwind
> > > > +define void @test1(i64 %d1, i64 %d2, i64 %d3, i64 %d4, i64
> > > > %d5,
> > > > i64
> > > > %d6, i64 %d7, i64 %d8, i64 %d9, <4 x float> inreg %vs.coerce)
> > > > #0
> > > > {
> > > > +entry:
> > > > + store <4 x float> %vs.coerce, <4 x float>* @ve, align 16,
> > > > !tbaa
> > > > !0
> > > > + ret void
> > > > +
> > > > +; CHECK-LABEL: @test1
> > > > +; CHECK: stvx 2,
> > > > +; CHECK: blr
> > > > +}
> > > > +
> > > > +; Function Attrs: nounwind
> > > > +define void @test2(i64 %d1, i64 %d2, i64 %d3, i64 %d4, i64
> > > > %d5,
> > > > i64
> > > > %d6, i64 %d7, i64 %d8, %struct.s2* byval nocapture readonly
> > > > %vs)
> > > > #0
> > > > {
> > > > +entry:
> > > > + %m = getelementptr inbounds %struct.s2* %vs, i64 0, i32 0
> > > > + %0 = load i64* %m, align 8, !tbaa !2
> > > > + store i64 %0, i64* @n, align 8, !tbaa !2
> > > > + %v = getelementptr inbounds %struct.s2* %vs, i64 0, i32 1
> > > > + %1 = load <4 x float>* %v, align 16, !tbaa !0
> > > > + store <4 x float> %1, <4 x float>* @ve, align 16, !tbaa !0
> > > > + ret void
> > > > +
> > > > +; CHECK-LABEL: @test2
> > > > +; CHECK: ld {{[0-9]+}}, 112(1)
> > > > +; CHECK: li [[REG16:[0-9]+]], 16
> > > > +; CHECK: addi [[REGB:[0-9]+]], 1, 112
> > > > +; CHECK: lvx 2, [[REGB]], [[REG16]]
> > > > +; CHECK: blr
> > > > +}
> > > > +
> > > > +; Function Attrs: nounwind
> > > > +define void @test3(i64 %d1, i64 %d2, i64 %d3, i64 %d4, i64
> > > > %d5,
> > > > i64
> > > > %d6, i64 %d7, i64 %d8, i64 %d9, %struct.s2* byval nocapture
> > > > readonly
> > > > %vs) #0 {
> > > > +entry:
> > > > + %m = getelementptr inbounds %struct.s2* %vs, i64 0, i32 0
> > > > + %0 = load i64* %m, align 8, !tbaa !2
> > > > + store i64 %0, i64* @n, align 8, !tbaa !2
> > > > + %v = getelementptr inbounds %struct.s2* %vs, i64 0, i32 1
> > > > + %1 = load <4 x float>* %v, align 16, !tbaa !0
> > > > + store <4 x float> %1, <4 x float>* @ve, align 16, !tbaa !0
> > > > + ret void
> > > > +
> > > > +; CHECK-LABEL: @test3
> > > > +; CHECK: ld {{[0-9]+}}, 128(1)
> > > > +; CHECK: li [[REG16:[0-9]+]], 16
> > > > +; CHECK: addi [[REGB:[0-9]+]], 1, 128
> > > > +; CHECK: lvx 2, [[REGB]], [[REG16]]
> > > > +; CHECK: blr
> > > > +}
> > > > +
> > > > +attributes #0 = { nounwind }
> > > > +
> > > > +!0 = metadata !{metadata !"omnipotent char", metadata !1}
> > > > +!1 = metadata !{metadata !"Simple C/C++ TBAA"}
> > > > +!2 = metadata !{metadata !"long", metadata !0}
> > > > +
> > > > 
> > > > 
> > > > _______________________________________________
> > > > llvm-commits mailing list
> > > > llvm-commits at cs.uiuc.edu
> > > > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> > > > 
> > > > 
> > > 
> > > --
> > > Hal Finkel
> > > Assistant Computational Scientist
> > > Leadership Computing Facility
> > > Argonne National Laboratory
> > > 
> > > 
> > 
> > --
> > Hal Finkel
> > Assistant Computational Scientist
> > Leadership Computing Facility
> > Argonne National Laboratory
> > 
> > 
> 
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
> 
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory