Weird problems with cos (was Re: [PATCH v3 2/3] R600: Add carry and borrow instructions. Use them to implement UADDO/USUBO)
Tom Stellard
tom at stellard.net
Fri Oct 3 10:06:30 PDT 2014
On Fri, Oct 03, 2014 at 12:32:03PM -0400, Jan Vesely wrote:
> Hi Tom, Matt,
>
> I'm running into strange issues with the cos test (piglit
> generated_tests/cl/builtin/math/builtin-float-cos-1.0.generated.c)
>
> I have been seeing random failures (incorrect results) for some time and
> tried to investigate. the weird part is that the failures are not 100%
> reproducible, sometimes the tests pass, or partly pass
> (it's usually float8 and float16 subtests that fail).
> Failure is always the same
> "Expecting -0.925879 (0xbf6d0668) with tolerance 0.000000 (2 ulps), but got nan (0x7fc00000)"
> although the position may vary. even if the same value was computed earlier in the results array
>
> The first patch of this series does not change the behavior (or instruction dump).
> however, using the ADDC instruction results in hang on every cos test
> "ring 0 stalled for more than 10000msec"
> "GPU lockup (waiting for 0x00000000001023cf last fence id 0x00000000001023ce on ring 0)"
>
> although the actual test results follow the same result as before (random failures mostly in float8/16 tests).
> I can even get test pass with hang on every subtest
>
> Using SIGN_EXTEND_INREG instead of "SUB 0" in this patch gets rid of the hangs,
> and makes the failures fully reproducible in every subtest, triggered on the first
> occurrence of what should have been -0.925879.
>
> the GPU is AMD TURKS (HD 7570 1002:675d)
>
> I tried digging throught he manual but it oly mentions that ADDC is vec and trans inst.
> Is there any errata document the might give a hint?
It's possible the bug is somewhere else and adding the addc
instruction changed the program enough to uncover it. Try modify the
packetizer to only allow one instruction per group. I think modifying
R600Packetizer::isLegalToPacketizeTogether() to always return false will
do this.
If you still get lockups even with this, the we can rule out some kind of packetizer bug.
-Tom
>
> thanks,
> jan
>
> PS: There are no problems with sin, so I might be able to triage at least the code that hangs with this patch
>
>
> On Wed, 2014-09-24 at 20:27 -0400, Jan Vesely wrote:
> > v2: tighten the sub64 tests
> > v3: rename to CARRY/BORROW
> >
> > Signed-off-by: Jan Vesely <jan.vesely at rutgers.edu>
> >
> > ---
> > lib/Target/R600/AMDGPUISelLowering.h | 2 +
> > lib/Target/R600/AMDGPUInstrInfo.td | 6 ++
> > lib/Target/R600/AMDGPUSubtarget.h | 8 ++
> > lib/Target/R600/EvergreenInstructions.td | 3 +
> > lib/Target/R600/R600ISelLowering.cpp | 39 +++++++-
> > test/CodeGen/R600/add.ll | 154 +++++++++++++++++--------------
> > test/CodeGen/R600/sub.ll | 18 ++--
> > test/CodeGen/R600/uaddo.ll | 17 +++-
> > test/CodeGen/R600/usubo.ll | 23 ++++-
> > 9 files changed, 189 insertions(+), 81 deletions(-)
> >
> > diff --git a/lib/Target/R600/AMDGPUISelLowering.h b/lib/Target/R600/AMDGPUISelLowering.h
> > index 911576b..6eaf001 100644
> > --- a/lib/Target/R600/AMDGPUISelLowering.h
> > +++ b/lib/Target/R600/AMDGPUISelLowering.h
> > @@ -205,6 +205,8 @@ enum {
> > RSQ_CLAMPED,
> > LDEXP,
> > DOT4,
> > + CARRY,
> > + BORROW,
> > BFE_U32, // Extract range of bits with zero extension to 32-bits.
> > BFE_I32, // Extract range of bits with sign extension to 32-bits.
> > BFI, // (src0 & src1) | (~src0 & src2)
> > diff --git a/lib/Target/R600/AMDGPUInstrInfo.td b/lib/Target/R600/AMDGPUInstrInfo.td
> > index 3d70791..1600c4a 100644
> > --- a/lib/Target/R600/AMDGPUInstrInfo.td
> > +++ b/lib/Target/R600/AMDGPUInstrInfo.td
> > @@ -91,6 +91,12 @@ def AMDGPUumin : SDNode<"AMDGPUISD::UMIN", SDTIntBinOp,
> > [SDNPCommutative, SDNPAssociative]
> > >;
> >
> > +// out = (src0 + src1 > 0xFFFFFFFF) ? 1 : 0
> > +def AMDGPUcarry : SDNode<"AMDGPUISD::CARRY", SDTIntBinOp, []>;
> > +
> > +// out = (src1 > src0) ? 1 : 0
> > +def AMDGPUborrow : SDNode<"AMDGPUISD::BORROW", SDTIntBinOp, []>;
> > +
> >
> > def AMDGPUcvt_f32_ubyte0 : SDNode<"AMDGPUISD::CVT_F32_UBYTE0",
> > SDTIntToFPOp, []>;
> > diff --git a/lib/Target/R600/AMDGPUSubtarget.h b/lib/Target/R600/AMDGPUSubtarget.h
> > index 6797972..9f2ba61 100644
> > --- a/lib/Target/R600/AMDGPUSubtarget.h
> > +++ b/lib/Target/R600/AMDGPUSubtarget.h
> > @@ -168,6 +168,14 @@ public:
> > return (getGeneration() >= EVERGREEN);
> > }
> >
> > + bool hasCARRY() const {
> > + return (getGeneration() >= EVERGREEN);
> > + }
> > +
> > + bool hasBORROW() const {
> > + return (getGeneration() >= EVERGREEN);
> > + }
> > +
> > bool IsIRStructurizerEnabled() const {
> > return EnableIRStructurizer;
> > }
> > diff --git a/lib/Target/R600/EvergreenInstructions.td b/lib/Target/R600/EvergreenInstructions.td
> > index 8117b60..d3822ef 100644
> > --- a/lib/Target/R600/EvergreenInstructions.td
> > +++ b/lib/Target/R600/EvergreenInstructions.td
> > @@ -336,6 +336,9 @@ defm CUBE_eg : CUBE_Common<0xC0>;
> >
> > def BCNT_INT : R600_1OP_Helper <0xAA, "BCNT_INT", ctpop, VecALU>;
> >
> > +def ADDC_UINT : R600_2OP_Helper <0x52, "ADDC_UINT", AMDGPUcarry>;
> > +def SUBB_UINT : R600_2OP_Helper <0x53, "SUBB_UINT", AMDGPUborrow>;
> > +
> > def FFBH_UINT : R600_1OP_Helper <0xAB, "FFBH_UINT", ctlz_zero_undef, VecALU>;
> > def FFBL_INT : R600_1OP_Helper <0xAC, "FFBL_INT", cttz_zero_undef, VecALU>;
> >
> > diff --git a/lib/Target/R600/R600ISelLowering.cpp b/lib/Target/R600/R600ISelLowering.cpp
> > index 9b2b689..a28b76a 100644
> > --- a/lib/Target/R600/R600ISelLowering.cpp
> > +++ b/lib/Target/R600/R600ISelLowering.cpp
> > @@ -89,6 +89,15 @@ R600TargetLowering::R600TargetLowering(TargetMachine &TM) :
> > setOperationAction(ISD::SELECT, MVT::v2i32, Expand);
> > setOperationAction(ISD::SELECT, MVT::v4i32, Expand);
> >
> > + // ADD, SUB overflow. These need to be Custom because
> > + // SelectionDAGLegalize::LegalizeOp (LegalizeDAG.cpp)
> > + // turns Legal into expand
> > + if (Subtarget->hasCARRY())
> > + setOperationAction(ISD::UADDO, MVT::i32, Custom);
> > +
> > + if (Subtarget->hasBORROW())
> > + setOperationAction(ISD::USUBO, MVT::i32, Custom);
> > +
> > // Expand sign extension of vectors
> > if (!Subtarget->hasBFE())
> > setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i1, Expand);
> > @@ -154,8 +163,6 @@ R600TargetLowering::R600TargetLowering(TargetMachine &TM) :
> > setTargetDAGCombine(ISD::SELECT_CC);
> > setTargetDAGCombine(ISD::INSERT_VECTOR_ELT);
> >
> > - setOperationAction(ISD::SUB, MVT::i64, Expand);
> > -
> > // These should be replaced by UDVIREM, but it does not happen automatically
> > // during Type Legalization
> > setOperationAction(ISD::UDIV, MVT::i64, Custom);
> > @@ -578,6 +585,34 @@ SDValue R600TargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const
> > case ISD::SHL_PARTS: return LowerSHLParts(Op, DAG);
> > case ISD::SRA_PARTS:
> > case ISD::SRL_PARTS: return LowerSRXParts(Op, DAG);
> > + case ISD::UADDO: {
> > + SDLoc DL(Op);
> > + EVT VT = Op.getValueType();
> > +
> > + SDValue Lo = Op.getOperand(0);
> > + SDValue Hi = Op.getOperand(1);
> > +
> > + SDValue OVF = DAG.getNode(AMDGPUISD::CARRY, DL, VT, Lo, Hi);
> > + //negate sign
> > + OVF = DAG.getNode(ISD::SUB, DL, VT, DAG.getConstant(0, VT), OVF);
> > + SDValue Res = DAG.getNode(ISD::ADD, DL, VT, Lo, Hi);
> > +
> > + return DAG.getNode(ISD::MERGE_VALUES, DL, DAG.getVTList(VT, VT), Res, OVF);
> > + }
> > + case ISD::USUBO: {
> > + SDLoc DL(Op);
> > + EVT VT = Op.getValueType();
> > +
> > + SDValue Arg0 = Op.getOperand(0);
> > + SDValue Arg1 = Op.getOperand(1);
> > +
> > + SDValue OVF = DAG.getNode(AMDGPUISD::BORROW, DL, VT, Arg0, Arg1);
> > + //negate sign
> > + OVF = DAG.getNode(ISD::SUB, DL, VT, DAG.getConstant(0, VT), OVF);
> > + SDValue Res = DAG.getNode(ISD::SUB, DL, VT, Arg0, Arg1);
> > +
> > + return DAG.getNode(ISD::MERGE_VALUES, DL, DAG.getVTList(VT, VT), Res, OVF);
> > + }
> > case ISD::FCOS:
> > case ISD::FSIN: return LowerTrig(Op, DAG);
> > case ISD::SELECT_CC: return LowerSELECT_CC(Op, DAG);
> > diff --git a/test/CodeGen/R600/add.ll b/test/CodeGen/R600/add.ll
> > index 8cf43d1..fddb951 100644
> > --- a/test/CodeGen/R600/add.ll
> > +++ b/test/CodeGen/R600/add.ll
> > @@ -1,12 +1,12 @@
> > -; RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck --check-prefix=EG-CHECK --check-prefix=FUNC %s
> > -; RUN: llc < %s -march=r600 -mcpu=verde -verify-machineinstrs | FileCheck --check-prefix=SI-CHECK --check-prefix=FUNC %s
> > +; RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck --check-prefix=EG --check-prefix=FUNC %s
> > +; RUN: llc < %s -march=r600 -mcpu=verde -verify-machineinstrs | FileCheck --check-prefix=SI --check-prefix=FUNC %s
> >
> > ;FUNC-LABEL: @test1:
> > -;EG-CHECK: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> > +;EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> >
> > -;SI-CHECK: V_ADD_I32_e32 [[REG:v[0-9]+]], {{v[0-9]+, v[0-9]+}}
> > -;SI-CHECK-NOT: [[REG]]
> > -;SI-CHECK: BUFFER_STORE_DWORD [[REG]],
> > +;SI: V_ADD_I32_e32 [[REG:v[0-9]+]], {{v[0-9]+, v[0-9]+}}
> > +;SI-NOT: [[REG]]
> > +;SI: BUFFER_STORE_DWORD [[REG]],
> > define void @test1(i32 addrspace(1)* %out, i32 addrspace(1)* %in) {
> > %b_ptr = getelementptr i32 addrspace(1)* %in, i32 1
> > %a = load i32 addrspace(1)* %in
> > @@ -17,11 +17,11 @@ define void @test1(i32 addrspace(1)* %out, i32 addrspace(1)* %in) {
> > }
> >
> > ;FUNC-LABEL: @test2:
> > -;EG-CHECK: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> > -;EG-CHECK: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> > +;EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> > +;EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> >
> > -;SI-CHECK: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
> > -;SI-CHECK: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
> > +;SI: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
> > +;SI: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
> >
> > define void @test2(<2 x i32> addrspace(1)* %out, <2 x i32> addrspace(1)* %in) {
> > %b_ptr = getelementptr <2 x i32> addrspace(1)* %in, i32 1
> > @@ -33,15 +33,15 @@ define void @test2(<2 x i32> addrspace(1)* %out, <2 x i32> addrspace(1)* %in) {
> > }
> >
> > ;FUNC-LABEL: @test4:
> > -;EG-CHECK: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> > -;EG-CHECK: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> > -;EG-CHECK: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> > -;EG-CHECK: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> > +;EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> > +;EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> > +;EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> > +;EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> >
> > -;SI-CHECK: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
> > -;SI-CHECK: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
> > -;SI-CHECK: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
> > -;SI-CHECK: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
> > +;SI: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
> > +;SI: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
> > +;SI: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
> > +;SI: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
> >
> > define void @test4(<4 x i32> addrspace(1)* %out, <4 x i32> addrspace(1)* %in) {
> > %b_ptr = getelementptr <4 x i32> addrspace(1)* %in, i32 1
> > @@ -53,22 +53,22 @@ define void @test4(<4 x i32> addrspace(1)* %out, <4 x i32> addrspace(1)* %in) {
> > }
> >
> > ; FUNC-LABEL: @test8
> > -; EG-CHECK: ADD_INT
> > -; EG-CHECK: ADD_INT
> > -; EG-CHECK: ADD_INT
> > -; EG-CHECK: ADD_INT
> > -; EG-CHECK: ADD_INT
> > -; EG-CHECK: ADD_INT
> > -; EG-CHECK: ADD_INT
> > -; EG-CHECK: ADD_INT
> > -; SI-CHECK: S_ADD_I32
> > -; SI-CHECK: S_ADD_I32
> > -; SI-CHECK: S_ADD_I32
> > -; SI-CHECK: S_ADD_I32
> > -; SI-CHECK: S_ADD_I32
> > -; SI-CHECK: S_ADD_I32
> > -; SI-CHECK: S_ADD_I32
> > -; SI-CHECK: S_ADD_I32
> > +; EG: ADD_INT
> > +; EG: ADD_INT
> > +; EG: ADD_INT
> > +; EG: ADD_INT
> > +; EG: ADD_INT
> > +; EG: ADD_INT
> > +; EG: ADD_INT
> > +; EG: ADD_INT
> > +; SI: S_ADD_I32
> > +; SI: S_ADD_I32
> > +; SI: S_ADD_I32
> > +; SI: S_ADD_I32
> > +; SI: S_ADD_I32
> > +; SI: S_ADD_I32
> > +; SI: S_ADD_I32
> > +; SI: S_ADD_I32
> > define void @test8(<8 x i32> addrspace(1)* %out, <8 x i32> %a, <8 x i32> %b) {
> > entry:
> > %0 = add <8 x i32> %a, %b
> > @@ -77,38 +77,38 @@ entry:
> > }
> >
> > ; FUNC-LABEL: @test16
> > -; EG-CHECK: ADD_INT
> > -; EG-CHECK: ADD_INT
> > -; EG-CHECK: ADD_INT
> > -; EG-CHECK: ADD_INT
> > -; EG-CHECK: ADD_INT
> > -; EG-CHECK: ADD_INT
> > -; EG-CHECK: ADD_INT
> > -; EG-CHECK: ADD_INT
> > -; EG-CHECK: ADD_INT
> > -; EG-CHECK: ADD_INT
> > -; EG-CHECK: ADD_INT
> > -; EG-CHECK: ADD_INT
> > -; EG-CHECK: ADD_INT
> > -; EG-CHECK: ADD_INT
> > -; EG-CHECK: ADD_INT
> > -; EG-CHECK: ADD_INT
> > -; SI-CHECK: S_ADD_I32
> > -; SI-CHECK: S_ADD_I32
> > -; SI-CHECK: S_ADD_I32
> > -; SI-CHECK: S_ADD_I32
> > -; SI-CHECK: S_ADD_I32
> > -; SI-CHECK: S_ADD_I32
> > -; SI-CHECK: S_ADD_I32
> > -; SI-CHECK: S_ADD_I32
> > -; SI-CHECK: S_ADD_I32
> > -; SI-CHECK: S_ADD_I32
> > -; SI-CHECK: S_ADD_I32
> > -; SI-CHECK: S_ADD_I32
> > -; SI-CHECK: S_ADD_I32
> > -; SI-CHECK: S_ADD_I32
> > -; SI-CHECK: S_ADD_I32
> > -; SI-CHECK: S_ADD_I32
> > +; EG: ADD_INT
> > +; EG: ADD_INT
> > +; EG: ADD_INT
> > +; EG: ADD_INT
> > +; EG: ADD_INT
> > +; EG: ADD_INT
> > +; EG: ADD_INT
> > +; EG: ADD_INT
> > +; EG: ADD_INT
> > +; EG: ADD_INT
> > +; EG: ADD_INT
> > +; EG: ADD_INT
> > +; EG: ADD_INT
> > +; EG: ADD_INT
> > +; EG: ADD_INT
> > +; EG: ADD_INT
> > +; SI: S_ADD_I32
> > +; SI: S_ADD_I32
> > +; SI: S_ADD_I32
> > +; SI: S_ADD_I32
> > +; SI: S_ADD_I32
> > +; SI: S_ADD_I32
> > +; SI: S_ADD_I32
> > +; SI: S_ADD_I32
> > +; SI: S_ADD_I32
> > +; SI: S_ADD_I32
> > +; SI: S_ADD_I32
> > +; SI: S_ADD_I32
> > +; SI: S_ADD_I32
> > +; SI: S_ADD_I32
> > +; SI: S_ADD_I32
> > +; SI: S_ADD_I32
> > define void @test16(<16 x i32> addrspace(1)* %out, <16 x i32> %a, <16 x i32> %b) {
> > entry:
> > %0 = add <16 x i32> %a, %b
> > @@ -117,8 +117,12 @@ entry:
> > }
> >
> > ; FUNC-LABEL: @add64
> > -; SI-CHECK: S_ADD_U32
> > -; SI-CHECK: S_ADDC_U32
> > +; SI: S_ADD_U32
> > +; SI: S_ADDC_U32
> > +
> > +; EG-DAG: ADD_INT
> > +; EG-DAG: ADDC_UINT
> > +; EG-DAG: ADD_INT
> > define void @add64(i64 addrspace(1)* %out, i64 %a, i64 %b) {
> > entry:
> > %0 = add i64 %a, %b
> > @@ -132,7 +136,11 @@ entry:
> > ; to a VGPR before doing the add.
> >
> > ; FUNC-LABEL: @add64_sgpr_vgpr
> > -; SI-CHECK-NOT: V_ADDC_U32_e32 s
> > +; SI-NOT: V_ADDC_U32_e32 s
> > +
> > +; EG-DAG: ADD_INT
> > +; EG-DAG: ADDC_UINT
> > +; EG-DAG: ADD_INT
> > define void @add64_sgpr_vgpr(i64 addrspace(1)* %out, i64 %a, i64 addrspace(1)* %in) {
> > entry:
> > %0 = load i64 addrspace(1)* %in
> > @@ -143,8 +151,12 @@ entry:
> >
> > ; Test i64 add inside a branch.
> > ; FUNC-LABEL: @add64_in_branch
> > -; SI-CHECK: S_ADD_U32
> > -; SI-CHECK: S_ADDC_U32
> > +; SI: S_ADD_U32
> > +; SI: S_ADDC_U32
> > +
> > +; EG-DAG: ADD_INT
> > +; EG-DAG: ADDC_UINT
> > +; EG-DAG: ADD_INT
> > define void @add64_in_branch(i64 addrspace(1)* %out, i64 addrspace(1)* %in, i64 %a, i64 %b, i64 %c) {
> > entry:
> > %0 = icmp eq i64 %a, 0
> > diff --git a/test/CodeGen/R600/sub.ll b/test/CodeGen/R600/sub.ll
> > index 8678e2b..1225ebd 100644
> > --- a/test/CodeGen/R600/sub.ll
> > +++ b/test/CodeGen/R600/sub.ll
> > @@ -43,10 +43,13 @@ define void @test4(<4 x i32> addrspace(1)* %out, <4 x i32> addrspace(1)* %in) {
> > ; SI: S_SUB_U32
> > ; SI: S_SUBB_U32
> >
> > +; EG: MEM_RAT_CACHELESS STORE_RAW [[LO:T[0-9]+\.[XYZW]]]
> > +; EG: MEM_RAT_CACHELESS STORE_RAW [[HI:T[0-9]+\.[XYZW]]]
> > +; EG-DAG: SUB_INT {{[* ]*}}[[LO]]
> > +; EG-DAG: SUBB_UINT
> > ; EG-DAG: SUB_INT
> > -; EG-DAG: SETGT_UINT
> > -; EG-DAG: SUB_INT
> > -; EG-DAG: ADD_INT
> > +; EG-DAG: SUB_INT {{[* ]*}}[[HI]]
> > +; EG-NOT: SUB
> > define void @s_sub_i64(i64 addrspace(1)* noalias %out, i64 %a, i64 %b) nounwind {
> > %result = sub i64 %a, %b
> > store i64 %result, i64 addrspace(1)* %out, align 8
> > @@ -57,10 +60,13 @@ define void @s_sub_i64(i64 addrspace(1)* noalias %out, i64 %a, i64 %b) nounwind
> > ; SI: V_SUB_I32_e32
> > ; SI: V_SUBB_U32_e32
> >
> > +; EG: MEM_RAT_CACHELESS STORE_RAW [[LO:T[0-9]+\.[XYZW]]]
> > +; EG: MEM_RAT_CACHELESS STORE_RAW [[HI:T[0-9]+\.[XYZW]]]
> > +; EG-DAG: SUB_INT {{[* ]*}}[[LO]]
> > +; EG-DAG: SUBB_UINT
> > ; EG-DAG: SUB_INT
> > -; EG-DAG: SETGT_UINT
> > -; EG-DAG: SUB_INT
> > -; EG-DAG: ADD_INT
> > +; EG-DAG: SUB_INT {{[* ]*}}[[HI]]
> > +; EG-NOT: SUB
> > define void @v_sub_i64(i64 addrspace(1)* noalias %out, i64 addrspace(1)* noalias %inA, i64 addrspace(1)* noalias %inB) nounwind {
> > %tid = call i32 @llvm.r600.read.tidig.x() readnone
> > %a_ptr = getelementptr i64 addrspace(1)* %inA, i32 %tid
> > diff --git a/test/CodeGen/R600/uaddo.ll b/test/CodeGen/R600/uaddo.ll
> > index 0b854b5..ce30bbc 100644
> > --- a/test/CodeGen/R600/uaddo.ll
> > +++ b/test/CodeGen/R600/uaddo.ll
> > @@ -1,5 +1,5 @@
> > ; RUN: llc -march=r600 -mcpu=SI -verify-machineinstrs< %s | FileCheck -check-prefix=SI -check-prefix=FUNC %s
> > -; RUN: llc -march=r600 -mcpu=cypress -verify-machineinstrs< %s
> > +; RUN: llc -march=r600 -mcpu=cypress -verify-machineinstrs< %s | FileCheck -check-prefix=EG -check-prefix=FUNC %s
> >
> > declare { i32, i1 } @llvm.uadd.with.overflow.i32(i32, i32) nounwind readnone
> > declare { i64, i1 } @llvm.uadd.with.overflow.i64(i64, i64) nounwind readnone
> > @@ -8,6 +8,9 @@ declare { i64, i1 } @llvm.uadd.with.overflow.i64(i64, i64) nounwind readnone
> > ; SI: ADD
> > ; SI: ADDC
> > ; SI: ADDC
> > +
> > +; EG: ADDC_UINT
> > +; EG: ADDC_UINT
> > define void @uaddo_i64_zext(i64 addrspace(1)* %out, i64 %a, i64 %b) nounwind {
> > %uadd = call { i64, i1 } @llvm.uadd.with.overflow.i64(i64 %a, i64 %b) nounwind
> > %val = extractvalue { i64, i1 } %uadd, 0
> > @@ -20,6 +23,9 @@ define void @uaddo_i64_zext(i64 addrspace(1)* %out, i64 %a, i64 %b) nounwind {
> >
> > ; FUNC-LABEL: @s_uaddo_i32
> > ; SI: S_ADD_I32
> > +
> > +; EG: ADDC_UINT
> > +; EG: ADD_INT
> > define void @s_uaddo_i32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32 %a, i32 %b) nounwind {
> > %uadd = call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 %a, i32 %b) nounwind
> > %val = extractvalue { i32, i1 } %uadd, 0
> > @@ -31,6 +37,9 @@ define void @s_uaddo_i32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32
> >
> > ; FUNC-LABEL: @v_uaddo_i32
> > ; SI: V_ADD_I32
> > +
> > +; EG: ADDC_UINT
> > +; EG: ADD_INT
> > define void @v_uaddo_i32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32 addrspace(1)* %aptr, i32 addrspace(1)* %bptr) nounwind {
> > %a = load i32 addrspace(1)* %aptr, align 4
> > %b = load i32 addrspace(1)* %bptr, align 4
> > @@ -45,6 +54,9 @@ define void @v_uaddo_i32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32
> > ; FUNC-LABEL: @s_uaddo_i64
> > ; SI: S_ADD_U32
> > ; SI: S_ADDC_U32
> > +
> > +; EG: ADDC_UINT
> > +; EG: ADD_INT
> > define void @s_uaddo_i64(i64 addrspace(1)* %out, i1 addrspace(1)* %carryout, i64 %a, i64 %b) nounwind {
> > %uadd = call { i64, i1 } @llvm.uadd.with.overflow.i64(i64 %a, i64 %b) nounwind
> > %val = extractvalue { i64, i1 } %uadd, 0
> > @@ -57,6 +69,9 @@ define void @s_uaddo_i64(i64 addrspace(1)* %out, i1 addrspace(1)* %carryout, i64
> > ; FUNC-LABEL: @v_uaddo_i64
> > ; SI: V_ADD_I32
> > ; SI: V_ADDC_U32
> > +
> > +; EG: ADDC_UINT
> > +; EG: ADD_INT
> > define void @v_uaddo_i64(i64 addrspace(1)* %out, i1 addrspace(1)* %carryout, i64 addrspace(1)* %aptr, i64 addrspace(1)* %bptr) nounwind {
> > %a = load i64 addrspace(1)* %aptr, align 4
> > %b = load i64 addrspace(1)* %bptr, align 4
> > diff --git a/test/CodeGen/R600/usubo.ll b/test/CodeGen/R600/usubo.ll
> > index c293ad7..d7718e2 100644
> > --- a/test/CodeGen/R600/usubo.ll
> > +++ b/test/CodeGen/R600/usubo.ll
> > @@ -1,10 +1,13 @@
> > ; RUN: llc -march=r600 -mcpu=SI -verify-machineinstrs< %s | FileCheck -check-prefix=SI -check-prefix=FUNC %s
> > -; RUN: llc -march=r600 -mcpu=cypress -verify-machineinstrs< %s
> > +; RUN: llc -march=r600 -mcpu=cypress -verify-machineinstrs< %s | FileCheck -check-prefix=EG -check-prefix=FUNC %s
> >
> > declare { i32, i1 } @llvm.usub.with.overflow.i32(i32, i32) nounwind readnone
> > declare { i64, i1 } @llvm.usub.with.overflow.i64(i64, i64) nounwind readnone
> >
> > ; FUNC-LABEL: @usubo_i64_zext
> > +
> > +; EG: SUBB_UINT
> > +; EG: ADDC_UINT
> > define void @usubo_i64_zext(i64 addrspace(1)* %out, i64 %a, i64 %b) nounwind {
> > %usub = call { i64, i1 } @llvm.usub.with.overflow.i64(i64 %a, i64 %b) nounwind
> > %val = extractvalue { i64, i1 } %usub, 0
> > @@ -17,6 +20,10 @@ define void @usubo_i64_zext(i64 addrspace(1)* %out, i64 %a, i64 %b) nounwind {
> >
> > ; FUNC-LABEL: @s_usubo_i32
> > ; SI: S_SUB_I32
> > +
> > +; EG-DAG: SUBB_UINT
> > +; EG-DAG: SUB_INT
> > +; EG: SUB_INT
> > define void @s_usubo_i32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32 %a, i32 %b) nounwind {
> > %usub = call { i32, i1 } @llvm.usub.with.overflow.i32(i32 %a, i32 %b) nounwind
> > %val = extractvalue { i32, i1 } %usub, 0
> > @@ -28,6 +35,10 @@ define void @s_usubo_i32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32
> >
> > ; FUNC-LABEL: @v_usubo_i32
> > ; SI: V_SUBREV_I32_e32
> > +
> > +; EG-DAG: SUBB_UINT
> > +; EG-DAG: SUB_INT
> > +; EG: SUB_INT
> > define void @v_usubo_i32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32 addrspace(1)* %aptr, i32 addrspace(1)* %bptr) nounwind {
> > %a = load i32 addrspace(1)* %aptr, align 4
> > %b = load i32 addrspace(1)* %bptr, align 4
> > @@ -42,6 +53,11 @@ define void @v_usubo_i32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32
> > ; FUNC-LABEL: @s_usubo_i64
> > ; SI: S_SUB_U32
> > ; SI: S_SUBB_U32
> > +
> > +; EG-DAG: SUBB_UINT
> > +; EG-DAG: SUB_INT
> > +; EG-DAG: SUB_INT
> > +; EG: SUB_INT
> > define void @s_usubo_i64(i64 addrspace(1)* %out, i1 addrspace(1)* %carryout, i64 %a, i64 %b) nounwind {
> > %usub = call { i64, i1 } @llvm.usub.with.overflow.i64(i64 %a, i64 %b) nounwind
> > %val = extractvalue { i64, i1 } %usub, 0
> > @@ -54,6 +70,11 @@ define void @s_usubo_i64(i64 addrspace(1)* %out, i1 addrspace(1)* %carryout, i64
> > ; FUNC-LABEL: @v_usubo_i64
> > ; SI: V_SUB_I32
> > ; SI: V_SUBB_U32
> > +
> > +; EG-DAG: SUBB_UINT
> > +; EG-DAG: SUB_INT
> > +; EG-DAG: SUB_INT
> > +; EG: SUB_INT
> > define void @v_usubo_i64(i64 addrspace(1)* %out, i1 addrspace(1)* %carryout, i64 addrspace(1)* %aptr, i64 addrspace(1)* %bptr) nounwind {
> > %a = load i64 addrspace(1)* %aptr, align 4
> > %b = load i64 addrspace(1)* %bptr, align 4
>
> --
> Jan Vesely <jan.vesely at rutgers.edu>
>
> --
> Jan Vesely <jan.vesely at rutgers.edu>
More information about the llvm-commits
mailing list