[llvm] r190636 - Fix PPC ABI for ByVal structs with vector members

Fri Feb 27 15:41:39 PST 2015

----- Original Message -----
> From: "David Blaikie" <dblaikie at gmail.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: llvm-commits at cs.uiuc.edu
> Sent: Friday, February 27, 2015 3:35:32 PM
> Subject: Re: [llvm] r190636 - Fix PPC ABI for ByVal structs with vector members
> 
> On Fri, Feb 27, 2015 at 1:31 PM, Hal Finkel < hfinkel at anl.gov >
> wrote:
> 
> 
> ----- Original Message -----
> > From: "David Blaikie" < dblaikie at gmail.com >
> > To: "Hal Finkel" < hfinkel at anl.gov >
> > Cc: llvm-commits at cs.uiuc.edu
> 
> 
> > Sent: Friday, February 27, 2015 3:29:20 PM
> > Subject: Re: [llvm] r190636 - Fix PPC ABI for ByVal structs with
> > vector members
> > 
> > 
> > On Thu, Feb 19, 2015 at 4:24 PM, Hal Finkel < hfinkel at anl.gov >
> > wrote:
> > 
> > ----- Original Message -----
> > > From: "David Blaikie" < dblaikie at gmail.com >
> > > To: "Hal Finkel" < hfinkel at anl.gov >
> > > Cc: llvm-commits at cs.uiuc.edu
> > > Sent: Thursday, February 19, 2015 3:57:13 PM
> > > Subject: Re: [llvm] r190636 - Fix PPC ABI for ByVal structs with
> > > vector members
> > > 
> > > 
> > > On Thu, Sep 12, 2013 at 4:20 PM, Hal Finkel < hfinkel at anl.gov >
> > > wrote:
> > > 
> > > 
> > > Author: hfinkel
> > > Date: Thu Sep 12 18:20:06 2013
> > > New Revision: 190636
> > > 
> > > URL: http://llvm.org/viewvc/llvm-project?rev=190636&view=rev
> > > Log:
> > > Fix PPC ABI for ByVal structs with vector members
> > > 
> > > When a structure is passed by value, and that structure contains
> > > a
> > > vector
> > > member, according to the PPC ABI, the structure will receive
> > > enhanced
> > > alignment
> > > (so that the vector within the structure will always be aligned).
> > > 
> > > This should resolve PR16641.
> > > 
> > > 
> > > In an effort to simplify byval in a future world of opaque
> > > (untyped)
> > > pointers, I was considering changing "%struct.foo byval" to "ptr
> > > byval(num_bytes)" - which I think would work for every target
> > > except
> > > PPC (PPC, and the code in this commit, is the only case I could
> > > find
> > > of overriding getByValTypeAlignment that wasn't just the default
> > > of
> > > asking DataLayout for the type's alignment).
> > > 
> > > Is this code necessary? If I remove the entire PPCTargetLowering
> > > override of getByValTypeAlignment this test case (& everything
> > > else
> > > in check-llvm) appears to still pass.
> > > 
> > 
> > Interesting. I'll need to look at this in more detail. I'm fairly
> > certain that it was needed when it was added. It may also still be
> > necessary for QPX support (which is incomplete in tree at the
> > moment, but that should change in the next few days). In short,
> > please give me a few days, and then I should have a better idea.
> > Theoretically, however, I'm pretty sure it is necessary.
> > 
> > 
> > Ping - any further thoughts on this? Does Clang already put the
> > necessary alignment on byval attributes for PPC? Do we need to fix
> > Clang to do so? (once we have this in the frontend, if possible,
> > then I'll set about changing byval to just take a number of bytes
> > to
> > copy & ignore the type entirely - except when upgrading old
> > bitcode)
> > 
> > 
> 
> I think that Clang is okay; I'll confirm in a couple of hours.
> 
> 
> Might be easiest to just assert that the result from the
> target-specific handler produces the same as the default handler
> (which just consults the alignment of the type) - then do a
> bootstrap or whatever build would exercise this code well. (Since I
> don't have any PPC setup I'm not sure I can do this myself easily -
> but happy to take instructions if there's something I can do here)

Sure. Honestly, I don't have a great way to test this either (because I run only ppc64/Linux), but I'll demonstrate the issue...

Consider this file:
$ cat /tmp/tv.c 
typedef float __attribute__((ext_vector_type(4))) vfloat;
struct vfa {
  vfloat a[3];
};
struct vfs {
  vfloat x, y, z;
};

void fooa(vfa t);
void foos(vfs t);

void bara(vfa t) {
  fooa(t);
}

void bars(vfs t) {
  foos(t);
}

void test1() {
  vfa t = { (vfloat) { 0.0, 0.0, 0.0, 0.0 },
            (vfloat) { 1.0, 1.0, 1.0, 1.0 },
            (vfloat) { 2.0, 2.0, 2.0, 2.0 } };
  fooa(t);
}

For several reasons, the powerpc64 ABI, non-Darwin, passes these 'byval' structures, not as structures, but as casted arrays:

clang++ -target powerpc64-unknown-linux-gnu -O3 -S -emit-llvm -o - /tmp/tv.c

target datalayout = "E-m:e-i64:64-n32:64"
target triple = "powerpc64-unknown-linux-gnu"

define void @_Z4bara3vfa([3 x i128] %t.coerce) #0 {
entry:
  tail call void @_Z4fooa3vfa([3 x i128] %t.coerce)
  ret void
}

declare void @_Z4fooa3vfa([3 x i128]) #0

define void @_Z4bars3vfs([3 x i128] %t.coerce) #0 {
entry:
  tail call void @_Z4foos3vfs([3 x i128] %t.coerce)
  ret void
}

declare void @_Z4foos3vfs([3 x i128]) #0

define void @_Z5test1v() #0 {
entry:
  tail call void @_Z4fooa3vfa([3 x i128] [i128 0, i128 bitcast (<4 x float> <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00> to i128), i128 bitcast (<4 x float> <float 2.000000e+00, float 2.000000e+00, float 2.000000e+00, float 2.000000e+00> to i128)])
  ret void
}

and so we always get the alignment correct, because it is the alignment if 'i128' (which is 16). To digress slightly, I'm now somewhat concerned that this works more-or-less by accident (because DataLayout only defaults to giving natural alignment to vector types, for integers we normally just take the largest integer alignment specified, which would be 8 in this case, and for arrays we take the alignment of the element type) but all of the temporaries have 16 byte alignment, and so it happens to do the right thing (Uli implemented this, so I'm cc'ing him here).

For clarity, the first function, prior to optimization, looked like this:

define void @_Z4bara3vfa([3 x i128] %t.coerce) #0 {
entry:
  %t = alloca %struct.vfa, align 16
  %agg.tmp = alloca %struct.vfa, align 16
  %coerce.dive = getelementptr %struct.vfa, %struct.vfa* %t, i32 0, i32 0
  %0 = bitcast [3 x <4 x float>]* %coerce.dive to [3 x i128]*
  store [3 x i128] %t.coerce, [3 x i128]* %0, align 1
  %1 = bitcast %struct.vfa* %agg.tmp to i8*
  %2 = bitcast %struct.vfa* %t to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %1, i8* %2, i64 48, i32 16, i1 false), !tbaa.struct !1
  %coerce.dive1 = getelementptr %struct.vfa, %struct.vfa* %agg.tmp, i32 0, i32 0
  %3 = bitcast [3 x <4 x float>]* %coerce.dive1 to [3 x i128]*
  %4 = load [3 x i128]* %3, align 1
  call void @_Z4fooa3vfa([3 x i128] %4)
  ret void
}

In any case, for non-powerpc64/Linux systems, we don't use this array coercion, and we end up with something like this:

clang++ -target powerpc-unknown-linux-gnu -O3 -S -o - -emit-llvm /tmp/tv.c

target datalayout = "E-m:e-p:32:32-i64:64-n32"
target triple = "powerpc-unknown-linux-gnu"

%struct.vfa = type { [3 x <4 x float>] }
%struct.vfs = type { <4 x float>, <4 x float>, <4 x float> }

@_ZZ5test1vE1t = private unnamed_addr constant %struct.vfa { [3 x <4 x float>] [<4 x float> zeroinitializer, <4 x float> <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>, <4 x float> <float 2.000000e+00, float 2.000000e+00, float 2.000000e+00, float 2.000000e+00>] }, align 16

define void @_Z4bara3vfa(%struct.vfa* byval nocapture readonly %t) #0 {
entry:
  %agg.tmp = alloca %struct.vfa, align 16
  %0 = bitcast %struct.vfa* %agg.tmp to i8*
  %1 = bitcast %struct.vfa* %t to i8*
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* %0, i8* %1, i32 48, i32 16, i1 false), !tbaa.struct !1
  call void @_Z4fooa3vfa(%struct.vfa* byval %agg.tmp)
  ret void
}

...

and, as you can see, we don't explicitly specify an alignment on the byval argument (but nevertheless depend on it having 16-byte alignment -- as that is assumed by the alignment specified by the memcpy -- for correctness).

And, so, I'm not sure what the best plan is here. I'm certain that Clang (or any other FE) *can* put the correct alignment on the byval arguments. Furthermore, I'm fine with that solution.  We'd obviously need to update Clang to do so (and I assume you'd be doing that anyway). This does not currently affect powerpc64/Linux directly, so I don't have a great mechanism for testing, but we can certainly make sure that the outputs seem reasonable (and use code like this to construct some reasonable regression tests).

Thanks again,
Hal

> 
> - David
> 
> 
> 
> -Hal
> 
> 
> 
> > 
> > -Hal
> > 
> > 
> > 
> > > 
> > > 
> > > Added:
> > > llvm/trunk/test/CodeGen/PowerPC/vec-abi-align.ll
> > > Modified:
> > > llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp
> > > 
> > > Modified: llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp
> > > URL:
> > > http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp?rev=190636&r1=190635&r2=190636&view=diff
> > > ==============================================================================
> > > --- llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp (original)
> > > +++ llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp Thu Sep 12
> > > 18:20:06 2013
> > > @@ -578,24 +578,48 @@ PPCTargetLowering::PPCTargetLowering(PPC
> > > }
> > > }
> > > 
> > > +/// getMaxByValAlign - Helper for getByValTypeAlignment to
> > > determine
> > > +/// the desired ByVal argument alignment.
> > > +static void getMaxByValAlign(Type *Ty, unsigned &MaxAlign,
> > > + unsigned MaxMaxAlign) {
> > > + if (MaxAlign == MaxMaxAlign)
> > > + return;
> > > + if (VectorType *VTy = dyn_cast<VectorType>(Ty)) {
> > > + if (MaxMaxAlign >= 32 && VTy->getBitWidth() >= 256)
> > > + MaxAlign = 32;
> > > + else if (VTy->getBitWidth() >= 128 && MaxAlign < 16)
> > > + MaxAlign = 16;
> > > + } else if (ArrayType *ATy = dyn_cast<ArrayType>(Ty)) {
> > > + unsigned EltAlign = 0;
> > > + getMaxByValAlign(ATy->getElementType(), EltAlign, MaxMaxAlign);
> > > + if (EltAlign > MaxAlign)
> > > + MaxAlign = EltAlign;
> > > + } else if (StructType *STy = dyn_cast<StructType>(Ty)) {
> > > + for (unsigned i = 0, e = STy->getNumElements(); i != e; ++i) {
> > > + unsigned EltAlign = 0;
> > > + getMaxByValAlign(STy->getElementType(i), EltAlign,
> > > MaxMaxAlign);
> > > + if (EltAlign > MaxAlign)
> > > + MaxAlign = EltAlign;
> > > + if (MaxAlign == MaxMaxAlign)
> > > + break;
> > > + }
> > > + }
> > > +}
> > > +
> > > /// getByValTypeAlignment - Return the desired alignment for
> > > ByVal
> > > aggregate
> > > /// function arguments in the caller parameter area.
> > > unsigned PPCTargetLowering::getByValTypeAlignment(Type *Ty) const
> > > {
> > > const TargetMachine &TM = getTargetMachine();
> > > // Darwin passes everything on 4 byte boundary.
> > > - if (TM.getSubtarget<PPCSubtarget>().isDarwin())
> > > + if (PPCSubTarget.isDarwin())
> > > return 4;
> > > 
> > > // 16byte and wider vectors are passed on 16byte boundary.
> > > - if (VectorType *VTy = dyn_cast<VectorType>(Ty))
> > > - if (VTy->getBitWidth() >= 128)
> > > - return 16;
> > > -
> > > // The rest is 8 on PPC64 and 4 on PPC32 boundary.
> > > - if (PPCSubTarget.isPPC64())
> > > - return 8;
> > > -
> > > - return 4;
> > > + unsigned Align = PPCSubTarget.isPPC64() ? 8 : 4;
> > > + if (PPCSubTarget.hasAltivec() || PPCSubTarget.hasQPX())
> > > + getMaxByValAlign(Ty, Align, PPCSubTarget.hasQPX() ? 32 : 16);
> > > + return Align;
> > > }
> > > 
> > > const char *PPCTargetLowering::getTargetNodeName(unsigned Opcode)
> > > const {
> > > @@ -2281,6 +2305,13 @@ PPCTargetLowering::LowerFormalArguments_
> > > InVals.push_back(FIN);
> > > continue;
> > > }
> > > +
> > > + unsigned BVAlign = Flags.getByValAlign();
> > > + if (BVAlign > 8) {
> > > + ArgOffset = ((ArgOffset+BVAlign-1)/BVAlign)*BVAlign;
> > > + CurArgOffset = ArgOffset;
> > > + }
> > > +
> > > // All aggregates smaller than 8 bytes must be passed
> > > right-justified.
> > > if (ObjSize < PtrByteSize)
> > > CurArgOffset = CurArgOffset + (PtrByteSize - ObjSize);
> > > @@ -3870,6 +3901,15 @@ PPCTargetLowering::LowerCall_64SVR4(SDVa
> > > if (Size == 0)
> > > continue;
> > > 
> > > + unsigned BVAlign = Flags.getByValAlign();
> > > + if (BVAlign > 8) {
> > > + if (BVAlign % PtrByteSize != 0)
> > > + llvm_unreachable(
> > > + "ByVal alignment is not a multiple of the pointer size");
> > > +
> > > + ArgOffset = ((ArgOffset+BVAlign-1)/BVAlign)*BVAlign;
> > > + }
> > > +
> > > // All aggregates smaller than 8 bytes must be passed
> > > right-justified.
> > > if (Size==1 || Size==2 || Size==4) {
> > > EVT VT = (Size==1) ? MVT::i8 : ((Size==2) ? MVT::i16 : MVT::i32);
> > > 
> > > Added: llvm/trunk/test/CodeGen/PowerPC/vec-abi-align.ll
> > > URL:
> > > http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/PowerPC/vec-abi-align.ll?rev=190636&view=auto
> > > ==============================================================================
> > > --- llvm/trunk/test/CodeGen/PowerPC/vec-abi-align.ll (added)
> > > +++ llvm/trunk/test/CodeGen/PowerPC/vec-abi-align.ll Thu Sep 12
> > > 18:20:06 2013
> > > @@ -0,0 +1,64 @@
> > > +; RUN: llc -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 < %s
> > > |
> > > FileCheck %s
> > > +target datalayout =
> > > "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f128:128:128-v128:128:128-n32:64"
> > > +target triple = "powerpc64-unknown-linux-gnu"
> > > +
> > > +%struct.s2 = type { i64, <4 x float> }
> > > +
> > > + at ve = external global <4 x float>
> > > + at n = external global i64
> > > +
> > > +; Function Attrs: nounwind
> > > +define void @test1(i64 %d1, i64 %d2, i64 %d3, i64 %d4, i64 %d5,
> > > i64
> > > %d6, i64 %d7, i64 %d8, i64 %d9, <4 x float> inreg %vs.coerce) #0
> > > {
> > > +entry:
> > > + store <4 x float> %vs.coerce, <4 x float>* @ve, align 16, !tbaa
> > > !0
> > > + ret void
> > > +
> > > +; CHECK-LABEL: @test1
> > > +; CHECK: stvx 2,
> > > +; CHECK: blr
> > > +}
> > > +
> > > +; Function Attrs: nounwind
> > > +define void @test2(i64 %d1, i64 %d2, i64 %d3, i64 %d4, i64 %d5,
> > > i64
> > > %d6, i64 %d7, i64 %d8, %struct.s2* byval nocapture readonly %vs)
> > > #0
> > > {
> > > +entry:
> > > + %m = getelementptr inbounds %struct.s2* %vs, i64 0, i32 0
> > > + %0 = load i64* %m, align 8, !tbaa !2
> > > + store i64 %0, i64* @n, align 8, !tbaa !2
> > > + %v = getelementptr inbounds %struct.s2* %vs, i64 0, i32 1
> > > + %1 = load <4 x float>* %v, align 16, !tbaa !0
> > > + store <4 x float> %1, <4 x float>* @ve, align 16, !tbaa !0
> > > + ret void
> > > +
> > > +; CHECK-LABEL: @test2
> > > +; CHECK: ld {{[0-9]+}}, 112(1)
> > > +; CHECK: li [[REG16:[0-9]+]], 16
> > > +; CHECK: addi [[REGB:[0-9]+]], 1, 112
> > > +; CHECK: lvx 2, [[REGB]], [[REG16]]
> > > +; CHECK: blr
> > > +}
> > > +
> > > +; Function Attrs: nounwind
> > > +define void @test3(i64 %d1, i64 %d2, i64 %d3, i64 %d4, i64 %d5,
> > > i64
> > > %d6, i64 %d7, i64 %d8, i64 %d9, %struct.s2* byval nocapture
> > > readonly
> > > %vs) #0 {
> > > +entry:
> > > + %m = getelementptr inbounds %struct.s2* %vs, i64 0, i32 0
> > > + %0 = load i64* %m, align 8, !tbaa !2
> > > + store i64 %0, i64* @n, align 8, !tbaa !2
> > > + %v = getelementptr inbounds %struct.s2* %vs, i64 0, i32 1
> > > + %1 = load <4 x float>* %v, align 16, !tbaa !0
> > > + store <4 x float> %1, <4 x float>* @ve, align 16, !tbaa !0
> > > + ret void
> > > +
> > > +; CHECK-LABEL: @test3
> > > +; CHECK: ld {{[0-9]+}}, 128(1)
> > > +; CHECK: li [[REG16:[0-9]+]], 16
> > > +; CHECK: addi [[REGB:[0-9]+]], 1, 128
> > > +; CHECK: lvx 2, [[REGB]], [[REG16]]
> > > +; CHECK: blr
> > > +}
> > > +
> > > +attributes #0 = { nounwind }
> > > +
> > > +!0 = metadata !{metadata !"omnipotent char", metadata !1}
> > > +!1 = metadata !{metadata !"Simple C/C++ TBAA"}
> > > +!2 = metadata !{metadata !"long", metadata !0}
> > > +
> > > 
> > > 
> > > _______________________________________________
> > > llvm-commits mailing list
> > > llvm-commits at cs.uiuc.edu
> > > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> > > 
> > > 
> > 
> > --
> > Hal Finkel
> > Assistant Computational Scientist
> > Leadership Computing Facility
> > Argonne National Laboratory
> > 
> > 
> 
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
> 
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory