[LLVMdev] Weird problems with cos (was Re: [PATCH v3 2/3] R600: Add carry and borrow instructions. Use them to implement UADDO/USUBO)

Jan Vesely jan.vesely at rutgers.edu
Fri Oct 3 09:32:03 PDT 2014


Hi Tom, Matt,

I'm running into strange issues with the cos test (piglit
generated_tests/cl/builtin/math/builtin-float-cos-1.0.generated.c)

I have been seeing random failures (incorrect results) for some time and
tried to investigate. the weird part is that the failures are not 100%
reproducible, sometimes the tests pass, or partly pass
(it's usually float8 and float16 subtests that fail).
Failure is always the same
"Expecting -0.925879 (0xbf6d0668) with tolerance 0.000000 (2 ulps), but got nan (0x7fc00000)"
although the position may vary. even if the same value was computed earlier in the results array

The first patch of this series does not change the behavior (or instruction dump).
however, using the ADDC instruction results in hang on every cos test
"ring 0 stalled for more than 10000msec"
"GPU lockup (waiting for 0x00000000001023cf last fence id 0x00000000001023ce on ring 0)"

although the actual test results follow the same result as before (random failures mostly in float8/16 tests).
I can even get test pass with hang on every subtest

Using SIGN_EXTEND_INREG instead of "SUB 0" in this patch gets rid of the hangs,
and makes the failures fully reproducible in every subtest, triggered on the first
occurrence of what should have been -0.925879.

the GPU is AMD TURKS (HD 7570 1002:675d)

I tried digging throught he manual but it oly mentions that ADDC is vec and trans inst.
Is there any errata document the might give a hint?

thanks,
jan

PS: There are no problems with sin, so I might be able to triage at least the code that hangs with this patch


On Wed, 2014-09-24 at 20:27 -0400, Jan Vesely wrote:
> v2: tighten the sub64 tests
> v3: rename to CARRY/BORROW
> 
> Signed-off-by: Jan Vesely <jan.vesely at rutgers.edu>
> 
> ---
>  lib/Target/R600/AMDGPUISelLowering.h     |   2 +
>  lib/Target/R600/AMDGPUInstrInfo.td       |   6 ++
>  lib/Target/R600/AMDGPUSubtarget.h        |   8 ++
>  lib/Target/R600/EvergreenInstructions.td |   3 +
>  lib/Target/R600/R600ISelLowering.cpp     |  39 +++++++-
>  test/CodeGen/R600/add.ll                 | 154 +++++++++++++++++--------------
>  test/CodeGen/R600/sub.ll                 |  18 ++--
>  test/CodeGen/R600/uaddo.ll               |  17 +++-
>  test/CodeGen/R600/usubo.ll               |  23 ++++-
>  9 files changed, 189 insertions(+), 81 deletions(-)
> 
> diff --git a/lib/Target/R600/AMDGPUISelLowering.h b/lib/Target/R600/AMDGPUISelLowering.h
> index 911576b..6eaf001 100644
> --- a/lib/Target/R600/AMDGPUISelLowering.h
> +++ b/lib/Target/R600/AMDGPUISelLowering.h
> @@ -205,6 +205,8 @@ enum {
>    RSQ_CLAMPED,
>    LDEXP,
>    DOT4,
> +  CARRY,
> +  BORROW,
>    BFE_U32, // Extract range of bits with zero extension to 32-bits.
>    BFE_I32, // Extract range of bits with sign extension to 32-bits.
>    BFI, // (src0 & src1) | (~src0 & src2)
> diff --git a/lib/Target/R600/AMDGPUInstrInfo.td b/lib/Target/R600/AMDGPUInstrInfo.td
> index 3d70791..1600c4a 100644
> --- a/lib/Target/R600/AMDGPUInstrInfo.td
> +++ b/lib/Target/R600/AMDGPUInstrInfo.td
> @@ -91,6 +91,12 @@ def AMDGPUumin : SDNode<"AMDGPUISD::UMIN", SDTIntBinOp,
>    [SDNPCommutative, SDNPAssociative]
>  >;
>  
> +// out = (src0 + src1 > 0xFFFFFFFF) ? 1 : 0
> +def AMDGPUcarry : SDNode<"AMDGPUISD::CARRY", SDTIntBinOp, []>;
> +
> +// out = (src1 > src0) ? 1 : 0
> +def AMDGPUborrow : SDNode<"AMDGPUISD::BORROW", SDTIntBinOp, []>;
> +
>  
>  def AMDGPUcvt_f32_ubyte0 : SDNode<"AMDGPUISD::CVT_F32_UBYTE0",
>    SDTIntToFPOp, []>;
> diff --git a/lib/Target/R600/AMDGPUSubtarget.h b/lib/Target/R600/AMDGPUSubtarget.h
> index 6797972..9f2ba61 100644
> --- a/lib/Target/R600/AMDGPUSubtarget.h
> +++ b/lib/Target/R600/AMDGPUSubtarget.h
> @@ -168,6 +168,14 @@ public:
>      return (getGeneration() >= EVERGREEN);
>    }
>  
> +  bool hasCARRY() const {
> +    return (getGeneration() >= EVERGREEN);
> +  }
> +
> +  bool hasBORROW() const {
> +    return (getGeneration() >= EVERGREEN);
> +  }
> +
>    bool IsIRStructurizerEnabled() const {
>      return EnableIRStructurizer;
>    }
> diff --git a/lib/Target/R600/EvergreenInstructions.td b/lib/Target/R600/EvergreenInstructions.td
> index 8117b60..d3822ef 100644
> --- a/lib/Target/R600/EvergreenInstructions.td
> +++ b/lib/Target/R600/EvergreenInstructions.td
> @@ -336,6 +336,9 @@ defm CUBE_eg : CUBE_Common<0xC0>;
>  
>  def BCNT_INT : R600_1OP_Helper <0xAA, "BCNT_INT", ctpop, VecALU>;
>  
> +def ADDC_UINT : R600_2OP_Helper <0x52, "ADDC_UINT", AMDGPUcarry>;
> +def SUBB_UINT : R600_2OP_Helper <0x53, "SUBB_UINT", AMDGPUborrow>;
> +
>  def FFBH_UINT : R600_1OP_Helper <0xAB, "FFBH_UINT", ctlz_zero_undef, VecALU>;
>  def FFBL_INT : R600_1OP_Helper <0xAC, "FFBL_INT", cttz_zero_undef, VecALU>;
>  
> diff --git a/lib/Target/R600/R600ISelLowering.cpp b/lib/Target/R600/R600ISelLowering.cpp
> index 9b2b689..a28b76a 100644
> --- a/lib/Target/R600/R600ISelLowering.cpp
> +++ b/lib/Target/R600/R600ISelLowering.cpp
> @@ -89,6 +89,15 @@ R600TargetLowering::R600TargetLowering(TargetMachine &TM) :
>    setOperationAction(ISD::SELECT, MVT::v2i32, Expand);
>    setOperationAction(ISD::SELECT, MVT::v4i32, Expand);
>  
> +  // ADD, SUB overflow. These need to be Custom because
> +  // SelectionDAGLegalize::LegalizeOp (LegalizeDAG.cpp)
> +  // turns Legal into expand
> +  if (Subtarget->hasCARRY())
> +    setOperationAction(ISD::UADDO, MVT::i32, Custom);
> +
> +  if (Subtarget->hasBORROW())
> +    setOperationAction(ISD::USUBO, MVT::i32, Custom);
> +
>    // Expand sign extension of vectors
>    if (!Subtarget->hasBFE())
>      setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i1, Expand);
> @@ -154,8 +163,6 @@ R600TargetLowering::R600TargetLowering(TargetMachine &TM) :
>    setTargetDAGCombine(ISD::SELECT_CC);
>    setTargetDAGCombine(ISD::INSERT_VECTOR_ELT);
>  
> -  setOperationAction(ISD::SUB, MVT::i64, Expand);
> -
>    // These should be replaced by UDVIREM, but it does not happen automatically
>    // during Type Legalization
>    setOperationAction(ISD::UDIV, MVT::i64, Custom);
> @@ -578,6 +585,34 @@ SDValue R600TargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const
>    case ISD::SHL_PARTS: return LowerSHLParts(Op, DAG);
>    case ISD::SRA_PARTS:
>    case ISD::SRL_PARTS: return LowerSRXParts(Op, DAG);
> +  case ISD::UADDO: {
> +    SDLoc DL(Op);
> +    EVT VT = Op.getValueType();
> +
> +    SDValue Lo = Op.getOperand(0);
> +    SDValue Hi = Op.getOperand(1);
> +
> +    SDValue OVF = DAG.getNode(AMDGPUISD::CARRY, DL, VT, Lo, Hi);
> +    //negate sign
> +    OVF = DAG.getNode(ISD::SUB, DL, VT, DAG.getConstant(0, VT), OVF);
> +    SDValue Res = DAG.getNode(ISD::ADD, DL, VT, Lo, Hi);
> +
> +    return DAG.getNode(ISD::MERGE_VALUES, DL, DAG.getVTList(VT, VT), Res, OVF);
> +  }
> +  case ISD::USUBO: {
> +    SDLoc DL(Op);
> +    EVT VT = Op.getValueType();
> +
> +    SDValue Arg0 = Op.getOperand(0);
> +    SDValue Arg1 = Op.getOperand(1);
> +
> +    SDValue OVF = DAG.getNode(AMDGPUISD::BORROW, DL, VT, Arg0, Arg1);
> +    //negate sign
> +    OVF = DAG.getNode(ISD::SUB, DL, VT, DAG.getConstant(0, VT), OVF);
> +    SDValue Res = DAG.getNode(ISD::SUB, DL, VT, Arg0, Arg1);
> +
> +    return DAG.getNode(ISD::MERGE_VALUES, DL, DAG.getVTList(VT, VT), Res, OVF);
> +  }
>    case ISD::FCOS:
>    case ISD::FSIN: return LowerTrig(Op, DAG);
>    case ISD::SELECT_CC: return LowerSELECT_CC(Op, DAG);
> diff --git a/test/CodeGen/R600/add.ll b/test/CodeGen/R600/add.ll
> index 8cf43d1..fddb951 100644
> --- a/test/CodeGen/R600/add.ll
> +++ b/test/CodeGen/R600/add.ll
> @@ -1,12 +1,12 @@
> -; RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck --check-prefix=EG-CHECK --check-prefix=FUNC %s
> -; RUN: llc < %s -march=r600 -mcpu=verde -verify-machineinstrs | FileCheck --check-prefix=SI-CHECK --check-prefix=FUNC %s
> +; RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck --check-prefix=EG --check-prefix=FUNC %s
> +; RUN: llc < %s -march=r600 -mcpu=verde -verify-machineinstrs | FileCheck --check-prefix=SI --check-prefix=FUNC %s
>  
>  ;FUNC-LABEL: @test1:
> -;EG-CHECK: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> +;EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
>  
> -;SI-CHECK: V_ADD_I32_e32 [[REG:v[0-9]+]], {{v[0-9]+, v[0-9]+}}
> -;SI-CHECK-NOT: [[REG]]
> -;SI-CHECK: BUFFER_STORE_DWORD [[REG]],
> +;SI: V_ADD_I32_e32 [[REG:v[0-9]+]], {{v[0-9]+, v[0-9]+}}
> +;SI-NOT: [[REG]]
> +;SI: BUFFER_STORE_DWORD [[REG]],
>  define void @test1(i32 addrspace(1)* %out, i32 addrspace(1)* %in) {
>    %b_ptr = getelementptr i32 addrspace(1)* %in, i32 1
>    %a = load i32 addrspace(1)* %in
> @@ -17,11 +17,11 @@ define void @test1(i32 addrspace(1)* %out, i32 addrspace(1)* %in) {
>  }
>  
>  ;FUNC-LABEL: @test2:
> -;EG-CHECK: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> -;EG-CHECK: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> +;EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> +;EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
>  
> -;SI-CHECK: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
> -;SI-CHECK: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
> +;SI: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
> +;SI: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
>  
>  define void @test2(<2 x i32> addrspace(1)* %out, <2 x i32> addrspace(1)* %in) {
>    %b_ptr = getelementptr <2 x i32> addrspace(1)* %in, i32 1
> @@ -33,15 +33,15 @@ define void @test2(<2 x i32> addrspace(1)* %out, <2 x i32> addrspace(1)* %in) {
>  }
>  
>  ;FUNC-LABEL: @test4:
> -;EG-CHECK: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> -;EG-CHECK: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> -;EG-CHECK: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> -;EG-CHECK: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> +;EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> +;EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> +;EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> +;EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
>  
> -;SI-CHECK: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
> -;SI-CHECK: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
> -;SI-CHECK: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
> -;SI-CHECK: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
> +;SI: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
> +;SI: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
> +;SI: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
> +;SI: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
>  
>  define void @test4(<4 x i32> addrspace(1)* %out, <4 x i32> addrspace(1)* %in) {
>    %b_ptr = getelementptr <4 x i32> addrspace(1)* %in, i32 1
> @@ -53,22 +53,22 @@ define void @test4(<4 x i32> addrspace(1)* %out, <4 x i32> addrspace(1)* %in) {
>  }
>  
>  ; FUNC-LABEL: @test8
> -; EG-CHECK: ADD_INT
> -; EG-CHECK: ADD_INT
> -; EG-CHECK: ADD_INT
> -; EG-CHECK: ADD_INT
> -; EG-CHECK: ADD_INT
> -; EG-CHECK: ADD_INT
> -; EG-CHECK: ADD_INT
> -; EG-CHECK: ADD_INT
> -; SI-CHECK: S_ADD_I32
> -; SI-CHECK: S_ADD_I32
> -; SI-CHECK: S_ADD_I32
> -; SI-CHECK: S_ADD_I32
> -; SI-CHECK: S_ADD_I32
> -; SI-CHECK: S_ADD_I32
> -; SI-CHECK: S_ADD_I32
> -; SI-CHECK: S_ADD_I32
> +; EG: ADD_INT
> +; EG: ADD_INT
> +; EG: ADD_INT
> +; EG: ADD_INT
> +; EG: ADD_INT
> +; EG: ADD_INT
> +; EG: ADD_INT
> +; EG: ADD_INT
> +; SI: S_ADD_I32
> +; SI: S_ADD_I32
> +; SI: S_ADD_I32
> +; SI: S_ADD_I32
> +; SI: S_ADD_I32
> +; SI: S_ADD_I32
> +; SI: S_ADD_I32
> +; SI: S_ADD_I32
>  define void @test8(<8 x i32> addrspace(1)* %out, <8 x i32> %a, <8 x i32> %b) {
>  entry:
>    %0 = add <8 x i32> %a, %b
> @@ -77,38 +77,38 @@ entry:
>  }
>  
>  ; FUNC-LABEL: @test16
> -; EG-CHECK: ADD_INT
> -; EG-CHECK: ADD_INT
> -; EG-CHECK: ADD_INT
> -; EG-CHECK: ADD_INT
> -; EG-CHECK: ADD_INT
> -; EG-CHECK: ADD_INT
> -; EG-CHECK: ADD_INT
> -; EG-CHECK: ADD_INT
> -; EG-CHECK: ADD_INT
> -; EG-CHECK: ADD_INT
> -; EG-CHECK: ADD_INT
> -; EG-CHECK: ADD_INT
> -; EG-CHECK: ADD_INT
> -; EG-CHECK: ADD_INT
> -; EG-CHECK: ADD_INT
> -; EG-CHECK: ADD_INT
> -; SI-CHECK: S_ADD_I32
> -; SI-CHECK: S_ADD_I32
> -; SI-CHECK: S_ADD_I32
> -; SI-CHECK: S_ADD_I32
> -; SI-CHECK: S_ADD_I32
> -; SI-CHECK: S_ADD_I32
> -; SI-CHECK: S_ADD_I32
> -; SI-CHECK: S_ADD_I32
> -; SI-CHECK: S_ADD_I32
> -; SI-CHECK: S_ADD_I32
> -; SI-CHECK: S_ADD_I32
> -; SI-CHECK: S_ADD_I32
> -; SI-CHECK: S_ADD_I32
> -; SI-CHECK: S_ADD_I32
> -; SI-CHECK: S_ADD_I32
> -; SI-CHECK: S_ADD_I32
> +; EG: ADD_INT
> +; EG: ADD_INT
> +; EG: ADD_INT
> +; EG: ADD_INT
> +; EG: ADD_INT
> +; EG: ADD_INT
> +; EG: ADD_INT
> +; EG: ADD_INT
> +; EG: ADD_INT
> +; EG: ADD_INT
> +; EG: ADD_INT
> +; EG: ADD_INT
> +; EG: ADD_INT
> +; EG: ADD_INT
> +; EG: ADD_INT
> +; EG: ADD_INT
> +; SI: S_ADD_I32
> +; SI: S_ADD_I32
> +; SI: S_ADD_I32
> +; SI: S_ADD_I32
> +; SI: S_ADD_I32
> +; SI: S_ADD_I32
> +; SI: S_ADD_I32
> +; SI: S_ADD_I32
> +; SI: S_ADD_I32
> +; SI: S_ADD_I32
> +; SI: S_ADD_I32
> +; SI: S_ADD_I32
> +; SI: S_ADD_I32
> +; SI: S_ADD_I32
> +; SI: S_ADD_I32
> +; SI: S_ADD_I32
>  define void @test16(<16 x i32> addrspace(1)* %out, <16 x i32> %a, <16 x i32> %b) {
>  entry:
>    %0 = add <16 x i32> %a, %b
> @@ -117,8 +117,12 @@ entry:
>  }
>  
>  ; FUNC-LABEL: @add64
> -; SI-CHECK: S_ADD_U32
> -; SI-CHECK: S_ADDC_U32
> +; SI: S_ADD_U32
> +; SI: S_ADDC_U32
> +
> +; EG-DAG: ADD_INT
> +; EG-DAG: ADDC_UINT
> +; EG-DAG: ADD_INT
>  define void @add64(i64 addrspace(1)* %out, i64 %a, i64 %b) {
>  entry:
>    %0 = add i64 %a, %b
> @@ -132,7 +136,11 @@ entry:
>  ; to a VGPR before doing the add.
>  
>  ; FUNC-LABEL: @add64_sgpr_vgpr
> -; SI-CHECK-NOT: V_ADDC_U32_e32 s
> +; SI-NOT: V_ADDC_U32_e32 s
> +
> +; EG-DAG: ADD_INT
> +; EG-DAG: ADDC_UINT
> +; EG-DAG: ADD_INT
>  define void @add64_sgpr_vgpr(i64 addrspace(1)* %out, i64 %a, i64 addrspace(1)* %in) {
>  entry:
>    %0 = load i64 addrspace(1)* %in
> @@ -143,8 +151,12 @@ entry:
>  
>  ; Test i64 add inside a branch.
>  ; FUNC-LABEL: @add64_in_branch
> -; SI-CHECK: S_ADD_U32
> -; SI-CHECK: S_ADDC_U32
> +; SI: S_ADD_U32
> +; SI: S_ADDC_U32
> +
> +; EG-DAG: ADD_INT
> +; EG-DAG: ADDC_UINT
> +; EG-DAG: ADD_INT
>  define void @add64_in_branch(i64 addrspace(1)* %out, i64 addrspace(1)* %in, i64 %a, i64 %b, i64 %c) {
>  entry:
>    %0 = icmp eq i64 %a, 0
> diff --git a/test/CodeGen/R600/sub.ll b/test/CodeGen/R600/sub.ll
> index 8678e2b..1225ebd 100644
> --- a/test/CodeGen/R600/sub.ll
> +++ b/test/CodeGen/R600/sub.ll
> @@ -43,10 +43,13 @@ define void @test4(<4 x i32> addrspace(1)* %out, <4 x i32> addrspace(1)* %in) {
>  ; SI: S_SUB_U32
>  ; SI: S_SUBB_U32
>  
> +; EG: MEM_RAT_CACHELESS STORE_RAW [[LO:T[0-9]+\.[XYZW]]]
> +; EG: MEM_RAT_CACHELESS STORE_RAW [[HI:T[0-9]+\.[XYZW]]]
> +; EG-DAG: SUB_INT {{[* ]*}}[[LO]]
> +; EG-DAG: SUBB_UINT
>  ; EG-DAG: SUB_INT
> -; EG-DAG: SETGT_UINT
> -; EG-DAG: SUB_INT
> -; EG-DAG: ADD_INT
> +; EG-DAG: SUB_INT {{[* ]*}}[[HI]]
> +; EG-NOT: SUB
>  define void @s_sub_i64(i64 addrspace(1)* noalias %out, i64 %a, i64 %b) nounwind {
>    %result = sub i64 %a, %b
>    store i64 %result, i64 addrspace(1)* %out, align 8
> @@ -57,10 +60,13 @@ define void @s_sub_i64(i64 addrspace(1)* noalias %out, i64 %a, i64 %b) nounwind
>  ; SI: V_SUB_I32_e32
>  ; SI: V_SUBB_U32_e32
>  
> +; EG: MEM_RAT_CACHELESS STORE_RAW [[LO:T[0-9]+\.[XYZW]]]
> +; EG: MEM_RAT_CACHELESS STORE_RAW [[HI:T[0-9]+\.[XYZW]]]
> +; EG-DAG: SUB_INT {{[* ]*}}[[LO]]
> +; EG-DAG: SUBB_UINT
>  ; EG-DAG: SUB_INT
> -; EG-DAG: SETGT_UINT
> -; EG-DAG: SUB_INT
> -; EG-DAG: ADD_INT
> +; EG-DAG: SUB_INT {{[* ]*}}[[HI]]
> +; EG-NOT: SUB
>  define void @v_sub_i64(i64 addrspace(1)* noalias %out, i64 addrspace(1)* noalias %inA, i64 addrspace(1)* noalias %inB) nounwind {
>    %tid = call i32 @llvm.r600.read.tidig.x() readnone
>    %a_ptr = getelementptr i64 addrspace(1)* %inA, i32 %tid
> diff --git a/test/CodeGen/R600/uaddo.ll b/test/CodeGen/R600/uaddo.ll
> index 0b854b5..ce30bbc 100644
> --- a/test/CodeGen/R600/uaddo.ll
> +++ b/test/CodeGen/R600/uaddo.ll
> @@ -1,5 +1,5 @@
>  ; RUN: llc -march=r600 -mcpu=SI -verify-machineinstrs< %s | FileCheck -check-prefix=SI -check-prefix=FUNC %s
> -; RUN: llc -march=r600 -mcpu=cypress -verify-machineinstrs< %s
> +; RUN: llc -march=r600 -mcpu=cypress -verify-machineinstrs< %s | FileCheck -check-prefix=EG -check-prefix=FUNC %s
>  
>  declare { i32, i1 } @llvm.uadd.with.overflow.i32(i32, i32) nounwind readnone
>  declare { i64, i1 } @llvm.uadd.with.overflow.i64(i64, i64) nounwind readnone
> @@ -8,6 +8,9 @@ declare { i64, i1 } @llvm.uadd.with.overflow.i64(i64, i64) nounwind readnone
>  ; SI: ADD
>  ; SI: ADDC
>  ; SI: ADDC
> +
> +; EG: ADDC_UINT
> +; EG: ADDC_UINT
>  define void @uaddo_i64_zext(i64 addrspace(1)* %out, i64 %a, i64 %b) nounwind {
>    %uadd = call { i64, i1 } @llvm.uadd.with.overflow.i64(i64 %a, i64 %b) nounwind
>    %val = extractvalue { i64, i1 } %uadd, 0
> @@ -20,6 +23,9 @@ define void @uaddo_i64_zext(i64 addrspace(1)* %out, i64 %a, i64 %b) nounwind {
>  
>  ; FUNC-LABEL: @s_uaddo_i32
>  ; SI: S_ADD_I32
> +
> +; EG: ADDC_UINT
> +; EG: ADD_INT
>  define void @s_uaddo_i32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32 %a, i32 %b) nounwind {
>    %uadd = call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 %a, i32 %b) nounwind
>    %val = extractvalue { i32, i1 } %uadd, 0
> @@ -31,6 +37,9 @@ define void @s_uaddo_i32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32
>  
>  ; FUNC-LABEL: @v_uaddo_i32
>  ; SI: V_ADD_I32
> +
> +; EG: ADDC_UINT
> +; EG: ADD_INT
>  define void @v_uaddo_i32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32 addrspace(1)* %aptr, i32 addrspace(1)* %bptr) nounwind {
>    %a = load i32 addrspace(1)* %aptr, align 4
>    %b = load i32 addrspace(1)* %bptr, align 4
> @@ -45,6 +54,9 @@ define void @v_uaddo_i32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32
>  ; FUNC-LABEL: @s_uaddo_i64
>  ; SI: S_ADD_U32
>  ; SI: S_ADDC_U32
> +
> +; EG: ADDC_UINT
> +; EG: ADD_INT
>  define void @s_uaddo_i64(i64 addrspace(1)* %out, i1 addrspace(1)* %carryout, i64 %a, i64 %b) nounwind {
>    %uadd = call { i64, i1 } @llvm.uadd.with.overflow.i64(i64 %a, i64 %b) nounwind
>    %val = extractvalue { i64, i1 } %uadd, 0
> @@ -57,6 +69,9 @@ define void @s_uaddo_i64(i64 addrspace(1)* %out, i1 addrspace(1)* %carryout, i64
>  ; FUNC-LABEL: @v_uaddo_i64
>  ; SI: V_ADD_I32
>  ; SI: V_ADDC_U32
> +
> +; EG: ADDC_UINT
> +; EG: ADD_INT
>  define void @v_uaddo_i64(i64 addrspace(1)* %out, i1 addrspace(1)* %carryout, i64 addrspace(1)* %aptr, i64 addrspace(1)* %bptr) nounwind {
>    %a = load i64 addrspace(1)* %aptr, align 4
>    %b = load i64 addrspace(1)* %bptr, align 4
> diff --git a/test/CodeGen/R600/usubo.ll b/test/CodeGen/R600/usubo.ll
> index c293ad7..d7718e2 100644
> --- a/test/CodeGen/R600/usubo.ll
> +++ b/test/CodeGen/R600/usubo.ll
> @@ -1,10 +1,13 @@
>  ; RUN: llc -march=r600 -mcpu=SI -verify-machineinstrs< %s | FileCheck -check-prefix=SI -check-prefix=FUNC %s
> -; RUN: llc -march=r600 -mcpu=cypress -verify-machineinstrs< %s
> +; RUN: llc -march=r600 -mcpu=cypress -verify-machineinstrs< %s | FileCheck -check-prefix=EG -check-prefix=FUNC %s
>  
>  declare { i32, i1 } @llvm.usub.with.overflow.i32(i32, i32) nounwind readnone
>  declare { i64, i1 } @llvm.usub.with.overflow.i64(i64, i64) nounwind readnone
>  
>  ; FUNC-LABEL: @usubo_i64_zext
> +
> +; EG: SUBB_UINT
> +; EG: ADDC_UINT
>  define void @usubo_i64_zext(i64 addrspace(1)* %out, i64 %a, i64 %b) nounwind {
>    %usub = call { i64, i1 } @llvm.usub.with.overflow.i64(i64 %a, i64 %b) nounwind
>    %val = extractvalue { i64, i1 } %usub, 0
> @@ -17,6 +20,10 @@ define void @usubo_i64_zext(i64 addrspace(1)* %out, i64 %a, i64 %b) nounwind {
>  
>  ; FUNC-LABEL: @s_usubo_i32
>  ; SI: S_SUB_I32
> +
> +; EG-DAG: SUBB_UINT
> +; EG-DAG: SUB_INT
> +; EG: SUB_INT
>  define void @s_usubo_i32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32 %a, i32 %b) nounwind {
>    %usub = call { i32, i1 } @llvm.usub.with.overflow.i32(i32 %a, i32 %b) nounwind
>    %val = extractvalue { i32, i1 } %usub, 0
> @@ -28,6 +35,10 @@ define void @s_usubo_i32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32
>  
>  ; FUNC-LABEL: @v_usubo_i32
>  ; SI: V_SUBREV_I32_e32
> +
> +; EG-DAG: SUBB_UINT
> +; EG-DAG: SUB_INT
> +; EG: SUB_INT
>  define void @v_usubo_i32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32 addrspace(1)* %aptr, i32 addrspace(1)* %bptr) nounwind {
>    %a = load i32 addrspace(1)* %aptr, align 4
>    %b = load i32 addrspace(1)* %bptr, align 4
> @@ -42,6 +53,11 @@ define void @v_usubo_i32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32
>  ; FUNC-LABEL: @s_usubo_i64
>  ; SI: S_SUB_U32
>  ; SI: S_SUBB_U32
> +
> +; EG-DAG: SUBB_UINT
> +; EG-DAG: SUB_INT
> +; EG-DAG: SUB_INT
> +; EG: SUB_INT
>  define void @s_usubo_i64(i64 addrspace(1)* %out, i1 addrspace(1)* %carryout, i64 %a, i64 %b) nounwind {
>    %usub = call { i64, i1 } @llvm.usub.with.overflow.i64(i64 %a, i64 %b) nounwind
>    %val = extractvalue { i64, i1 } %usub, 0
> @@ -54,6 +70,11 @@ define void @s_usubo_i64(i64 addrspace(1)* %out, i1 addrspace(1)* %carryout, i64
>  ; FUNC-LABEL: @v_usubo_i64
>  ; SI: V_SUB_I32
>  ; SI: V_SUBB_U32
> +
> +; EG-DAG: SUBB_UINT
> +; EG-DAG: SUB_INT
> +; EG-DAG: SUB_INT
> +; EG: SUB_INT
>  define void @v_usubo_i64(i64 addrspace(1)* %out, i1 addrspace(1)* %carryout, i64 addrspace(1)* %aptr, i64 addrspace(1)* %bptr) nounwind {
>    %a = load i64 addrspace(1)* %aptr, align 4
>    %b = load i64 addrspace(1)* %bptr, align 4

-- 
Jan Vesely <jan.vesely at rutgers.edu>

-- 
Jan Vesely <jan.vesely at rutgers.edu>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dumps.tgz
Type: application/x-compressed-tar
Size: 1166570 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141003/d4c2ded0/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141003/d4c2ded0/attachment.sig>


More information about the llvm-dev mailing list