[llvm] r242409 - [Codegen] Add intrinsics 'absdiff' and corresponding SDNodes for absolute difference operation

Mon Jul 20 12:23:00 PDT 2015

[+ Asghar, whose patch this was]

On 20 Jul 2015, at 19:32, Mikhail Zolotukhin <mzolotukhin at apple.com<mailto:mzolotukhin at apple.com>> wrote:


On Jul 16, 2015, at 8:22 AM, James Molloy <James.Molloy at arm.com<mailto:James.Molloy at arm.com>> wrote:

Author: jamesm
Date: Thu Jul 16 10:22:46 2015
New Revision: 242409

URL: http://llvm.org/viewvc/llvm-project?rev=242409&view=rev
Log:
[Codegen] Add intrinsics 'absdiff' and corresponding SDNodes for absolute difference operation

This adds new intrinsics "*absdiff" for absolute difference ops to facilitate efficient code generation for "sum of absolute differences" operation.
The patch also contains the introduction of corresponding SDNodes and basic legalization support.Sanity of the generated code is tested on X86.

This is 1st of the three patches.

Patch by Shahid Asghar-ahmad!

Added:
  llvm/trunk/test/CodeGen/X86/absdiff_expand.ll
Modified:
  llvm/trunk/docs/LangRef.rst
  llvm/trunk/include/llvm/CodeGen/ISDOpcodes.h
  llvm/trunk/include/llvm/IR/Intrinsics.td
  llvm/trunk/include/llvm/Target/TargetSelectionDAG.td
  llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
  llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
  llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
  llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
  llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
  llvm/trunk/lib/CodeGen/TargetLoweringBase.cpp

Modified: llvm/trunk/docs/LangRef.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/LangRef.rst?rev=242409&r1=242408&r2=242409&view=diff
==============================================================================

--- llvm/trunk/docs/LangRef.rst (original)
+++ llvm/trunk/docs/LangRef.rst Thu Jul 16 10:22:46 2015
@@ -10328,6 +10328,65 @@ Examples:

     %r2 = call float @llvm.fmuladd.f32(float %a, float %b, float %c) ; yields float:r2 = (a * b) + c

+
+'``llvm.uabsdiff.*``' and '``llvm.sabsdiff.*``' Intrinsics
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+This is an overloaded intrinsic. The loaded data is a vector of any integer bit width.
+
+.. code-block:: llvm
+
+      declare <4 x integer> @llvm.uabsdiff.v4i32(<4 x integer> %a, <4 x integer> %b)
+
+
+Overview:
+"""""""""
+
+The ``llvm.uabsdiff`` intrinsic returns a vector result of the absolute difference of the two operands,
+treating them both as unsigned integers.
+
+The ``llvm.sabsdiff`` intrinsic returns  a vector result of the absolute difference of the two operands,
+treating them both as signed integers.
+
+.. note::
+
+    These intrinsics are primarily used during the code generation stage of compilation.
+    They are generated by compiler passes such as the Loop and SLP vectorizers.it<http://vectorizers.it> is not
+    recommended for users to create them manually.
+
+Arguments:
+""""""""""
+
+Both intrinsics take two integer of the same bitwidth.
+
+Semantics:
+""""""""""
+
+The expression::
+
+    call <4 x i32> @llvm.uabsdiff.v4i32(<4 x i32> %a, <4 x i32> %b)
+
+is equivalent to::
+
+    %sub = sub <4 x i32> %a, %b
+    %ispos = icmp ugt <4 x i32> %sub, <i32 -1, i32 -1, i32 -1, i32 -1>
Isn't it always 'false'?

+    %neg = sub <4 x i32> zeroinitializer, %sub
+    %1 = select <4 x i1> %ispos, <4 x i32> %sub, <4 x i32> %neg
+
+Similarly the expression::
+
+    call <4 x i32> @llvm.sabsdiff.v4i32(<4 x i32> %a, <4 x i32> %b)
+
+is equivalent to::
+
+    %sub = sub nsw <4 x i32> %a, %b
+    %ispos = icmp sgt <4 x i32> %sub, <i32 -1, i32 -1, i32 -1, i32 -1>
Wouldn't it be more readable if we use "icmp sge <4 x i32> %sub, zeroinitializer"?
+    %neg = sub nsw <4 x i32> zeroinitializer, %sub
+    %1 = select <4 x i1> %ispos, <4 x i32> %sub, <4 x i32> %neg
+
+
Half Precision Floating Point Intrinsics
----------------------------------------


Modified: llvm/trunk/include/llvm/CodeGen/ISDOpcodes.h
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/CodeGen/ISDOpcodes.h?rev=242409&r1=242408&r2=242409&view=diff
==============================================================================
--- llvm/trunk/include/llvm/CodeGen/ISDOpcodes.h (original)
+++ llvm/trunk/include/llvm/CodeGen/ISDOpcodes.h Thu Jul 16 10:22:46 2015
@@ -334,6 +334,10 @@ namespace ISD {
   /// Byte Swap and Counting operators.
   BSWAP, CTTZ, CTLZ, CTPOP,

+    /// [SU]ABSDIFF - Signed/Unsigned absolute difference of two input integer
+    /// vector. These nodes are generated from llvm.*absdiff* intrinsics.
+    SABSDIFF, UABSDIFF,
+
   /// Bit counting operators with an undefined result for zero inputs.
   CTTZ_ZERO_UNDEF, CTLZ_ZERO_UNDEF,


Modified: llvm/trunk/include/llvm/IR/Intrinsics.td
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/IR/Intrinsics.td?rev=242409&r1=242408&r2=242409&view=diff
==============================================================================
--- llvm/trunk/include/llvm/IR/Intrinsics.td (original)
+++ llvm/trunk/include/llvm/IR/Intrinsics.td Thu Jul 16 10:22:46 2015
@@ -605,6 +605,12 @@ def int_convertuu  : Intrinsic<[llvm_any
def int_clear_cache : Intrinsic<[], [llvm_ptr_ty, llvm_ptr_ty],
                               [], "llvm.clear_cache">;

+// Calculate the Absolute Differences of the two input vectors.
+def int_sabsdiff : Intrinsic<[llvm_anyvector_ty],
+                        [ LLVMMatchType<0>, LLVMMatchType<0> ], [IntrNoMem]>;
+def int_uabsdiff : Intrinsic<[llvm_anyvector_ty],
+                        [ LLVMMatchType<0>, LLVMMatchType<0> ], [IntrNoMem]>;
+
//===-------------------------- Masked Intrinsics -------------------------===//
//
def int_masked_store : Intrinsic<[], [llvm_anyvector_ty, LLVMPointerTo<0>,

Modified: llvm/trunk/include/llvm/Target/TargetSelectionDAG.td
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Target/TargetSelectionDAG.td?rev=242409&r1=242408&r2=242409&view=diff
==============================================================================
--- llvm/trunk/include/llvm/Target/TargetSelectionDAG.td (original)
+++ llvm/trunk/include/llvm/Target/TargetSelectionDAG.td Thu Jul 16 10:22:46 2015
@@ -386,6 +386,8 @@ def smax       : SDNode<"ISD::SMAX"
def umin       : SDNode<"ISD::UMIN"      , SDTIntBinOp>;
def umax       : SDNode<"ISD::UMAX"      , SDTIntBinOp>;

+def sabsdiff   : SDNode<"ISD::SABSDIFF"   , SDTIntBinOp>;
+def uabsdiff   : SDNode<"ISD::UABSDIFF"   , SDTIntBinOp>;
def sext_inreg : SDNode<"ISD::SIGN_EXTEND_INREG", SDTExtInreg>;
def bswap      : SDNode<"ISD::BSWAP"      , SDTIntUnaryOp>;
def ctlz       : SDNode<"ISD::CTLZ"       , SDTIntUnaryOp>;

Modified: llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp?rev=242409&r1=242408&r2=242409&view=diff
==============================================================================
--- llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp (original)
+++ llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp Thu Jul 16 10:22:46 2015
@@ -146,6 +146,10 @@ void DAGTypeLegalizer::PromoteIntegerRes
 case ISD::ATOMIC_CMP_SWAP_WITH_SUCCESS:
   Res = PromoteIntRes_AtomicCmpSwap(cast<AtomicSDNode>(N), ResNo);
   break;
+  case ISD::UABSDIFF:
+  case ISD::SABSDIFF:
+    Res = PromoteIntRes_SimpleIntBinOp(N);
+    break;
 }

 // If the result is null then the sub-method took care of registering it.

Modified: llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp?rev=242409&r1=242408&r2=242409&view=diff
==============================================================================
--- llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp (original)
+++ llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp Thu Jul 16 10:22:46 2015
@@ -105,6 +105,7 @@ class VectorLegalizer {
 SDValue ExpandLoad(SDValue Op);
 SDValue ExpandStore(SDValue Op);
 SDValue ExpandFNEG(SDValue Op);
+  SDValue ExpandABSDIFF(SDValue Op);

 /// \brief Implements vector promotion.
 ///
@@ -326,6 +327,8 @@ SDValue VectorLegalizer::LegalizeOp(SDVa
 case ISD::SMAX:
 case ISD::UMIN:
 case ISD::UMAX:
+  case ISD::UABSDIFF:
+  case ISD::SABSDIFF:
   QueryType = Node->getValueType(0);
   break;
 case ISD::FP_ROUND_INREG:
@@ -708,11 +711,36 @@ SDValue VectorLegalizer::Expand(SDValue
   return ExpandFNEG(Op);
 case ISD::SETCC:
   return UnrollVSETCC(Op);
+  case ISD::UABSDIFF:
+  case ISD::SABSDIFF:
+    return ExpandABSDIFF(Op);
 default:
   return DAG.UnrollVectorOp(Op.getNode());
 }
}

+SDValue VectorLegalizer::ExpandABSDIFF(SDValue Op) {
+  SDLoc dl(Op);
+  SDValue Tmp1, Tmp2, Tmp3, Tmp4;
+  EVT VT = Op.getValueType();
+  SDNodeFlags Flags;
+  Flags.setNoSignedWrap(Op->getOpcode() == ISD::SABSDIFF);
+
+  Tmp2 = Op.getOperand(0);
+  Tmp3 = Op.getOperand(1);
+  Tmp1 = DAG.getNode(ISD::SUB, dl, VT, Tmp2, Tmp3, &Flags);
+  Tmp2 =
+      DAG.getNode(ISD::SUB, dl, VT, DAG.getConstant(0, dl, VT), Tmp1, &Flags);
+  Tmp4 = DAG.getNode(
+      ISD::SETCC, dl,
+      TLI.getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), VT), Tmp2,
+      DAG.getConstant(0, dl, VT),
+      DAG.getCondCode(Op->getOpcode() == ISD::SABSDIFF ? ISD::SETLT
+                                                       : ISD::SETULT));
+  Tmp1 = DAG.getNode(ISD::VSELECT, dl, VT, Tmp4, Tmp1, Tmp2);
+  return Tmp1;
+}
+
SDValue VectorLegalizer::ExpandSELECT(SDValue Op) {
 // Lower a select instruction where the condition is a scalar and the
 // operands are vectors. Lower this select to VSELECT and implement it

Modified: llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp?rev=242409&r1=242408&r2=242409&view=diff
==============================================================================
--- llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp (original)
+++ llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp Thu Jul 16 10:22:46 2015
@@ -678,6 +678,8 @@ void DAGTypeLegalizer::SplitVectorResult
 case ISD::SMAX:
 case ISD::UMIN:
 case ISD::UMAX:
+  case ISD::UABSDIFF:
+  case ISD::SABSDIFF:
   SplitVecRes_BinOp(N, Lo, Hi);
   break;
 case ISD::FMA:

Modified: llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp?rev=242409&r1=242408&r2=242409&view=diff
==============================================================================
--- llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (original)
+++ llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp Thu Jul 16 10:22:46 2015
@@ -4646,6 +4646,18 @@ SelectionDAGBuilder::visitIntrinsicCall(
                            getValue(I.getArgOperand(0)).getValueType(),
                            getValue(I.getArgOperand(0))));
   return nullptr;
+  case Intrinsic::uabsdiff:
+    setValue(&I, DAG.getNode(ISD::UABSDIFF, sdl,
+                             getValue(I.getArgOperand(0)).getValueType(),
+                             getValue(I.getArgOperand(0)),
+                             getValue(I.getArgOperand(1))));
+    return nullptr;
+  case Intrinsic::sabsdiff:
+    setValue(&I, DAG.getNode(ISD::SABSDIFF, sdl,
+                             getValue(I.getArgOperand(0)).getValueType(),
+                             getValue(I.getArgOperand(0)),
+                             getValue(I.getArgOperand(1))));
+    return nullptr;
 case Intrinsic::cttz: {
   SDValue Arg = getValue(I.getArgOperand(0));
   ConstantInt *CI = cast<ConstantInt>(I.getArgOperand(1));

Modified: llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp?rev=242409&r1=242408&r2=242409&view=diff
==============================================================================
--- llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp (original)
+++ llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp Thu Jul 16 10:22:46 2015
@@ -225,6 +225,8 @@ std::string SDNode::getOperationName(con
 case ISD::SHL_PARTS:                  return "shl_parts";
 case ISD::SRA_PARTS:                  return "sra_parts";
 case ISD::SRL_PARTS:                  return "srl_parts";
+  case ISD::UABSDIFF:                   return "uabsdiff";
+  case ISD::SABSDIFF:                   return "sabsdiff";

 // Conversion operators.
 case ISD::SIGN_EXTEND:                return "sign_extend";

Modified: llvm/trunk/lib/CodeGen/TargetLoweringBase.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/TargetLoweringBase.cpp?rev=242409&r1=242408&r2=242409&view=diff
==============================================================================
--- llvm/trunk/lib/CodeGen/TargetLoweringBase.cpp (original)
+++ llvm/trunk/lib/CodeGen/TargetLoweringBase.cpp Thu Jul 16 10:22:46 2015
@@ -827,6 +827,8 @@ void TargetLoweringBase::initActions() {
   setOperationAction(ISD::USUBO, VT, Expand);
   setOperationAction(ISD::SMULO, VT, Expand);
   setOperationAction(ISD::UMULO, VT, Expand);
+    setOperationAction(ISD::UABSDIFF, VT, Expand);
+    setOperationAction(ISD::SABSDIFF, VT, Expand);

   // These library functions default to expand.
   setOperationAction(ISD::FROUND, VT, Expand);

Added: llvm/trunk/test/CodeGen/X86/absdiff_expand.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/absdiff_expand.ll?rev=242409&view=auto
==============================================================================
--- llvm/trunk/test/CodeGen/X86/absdiff_expand.ll (added)
+++ llvm/trunk/test/CodeGen/X86/absdiff_expand.ll Thu Jul 16 10:22:46 2015
@@ -0,0 +1,242 @@
+; RUN: llc -mtriple=x86_64-unknown-linux-gnu  < %s | FileCheck %s -check-prefix=CHECK
+
+declare <4 x i8> @llvm.uabsdiff.v4i8(<4 x i8>, <4 x i8>)
+
+define <4 x i8> @test_uabsdiff_v4i8_expand(<4 x i8> %a1, <4 x i8> %a2) {
+; CHECK-LABEL: test_uabsdiff_v4i8_expand
+; CHECK:             psubd  %xmm1, %xmm0
+; CHECK-NEXT:        pxor   %xmm1, %xmm1
+; CHECK-NEXT:        psubd  %xmm0, %xmm1
+; CHECK-NEXT:        movdqa  .LCPI{{[0-9_]*}}
+; CHECK-NEXT:        movdqa  %xmm1, %xmm3
+; CHECK-NEXT:        pxor   %xmm2, %xmm3
+; CHECK-NEXT:        pcmpgtd        %xmm3, %xmm2
+; CHECK-NEXT:        pand    %xmm2, %xmm0
+; CHECK-NEXT:        pandn   %xmm1, %xmm2
+; CHECK-NEXT:        por     %xmm2, %xmm0
+; CHECK-NEXT:        retq
+
+  %1 = call <4 x i8> @llvm.uabsdiff.v4i8(<4 x i8> %a1, <4 x i8> %a2)
+  ret <4 x i8> %1
+}
+
+declare <4 x i8> @llvm.sabsdiff.v4i8(<4 x i8>, <4 x i8>)
+
+define <4 x i8> @test_sabsdiff_v4i8_expand(<4 x i8> %a1, <4 x i8> %a2) {
+; CHECK-LABEL: test_sabsdiff_v4i8_expand
+; CHECK:      psubd  %xmm1, %xmm0
+; CHECK-NEXT: pxor   %xmm1, %xmm1
+; CHECK-NEXT: pxor    %xmm2, %xmm2
+; CHECK-NEXT: psubd  %xmm0, %xmm2
+; CHECK-NEXT: pcmpgtd  %xmm2, %xmm1
+; CHECK-NEXT: pand    %xmm1, %xmm0
+; CHECK-NEXT: pandn   %xmm2, %xmm1
+; CHECK-NEXT: por     %xmm1, %xmm0
+; CHECK-NEXT: retq
+
+  %1 = call <4 x i8> @llvm.sabsdiff.v4i8(<4 x i8> %a1, <4 x i8> %a2)
+  ret <4 x i8> %1
+}
+
+
+declare <8 x i8> @llvm.sabsdiff.v8i8(<8 x i8>, <8 x i8>)
+
+define <8 x i8> @test_sabsdiff_v8i8_expand(<8 x i8> %a1, <8 x i8> %a2) {
+; CHECK-LABEL: test_sabsdiff_v8i8_expand
+; CHECK:      psubw  %xmm1, %xmm0
+; CHECK-NEXT: pxor   %xmm1, %xmm1
+; CHECK-NEXT: pxor   %xmm2, %xmm2
+; CHECK-NEXT: psubw  %xmm0, %xmm2
+; CHECK-NEXT: pcmpgtw        %xmm2, %xmm1
+; CHECK-NEXT: pand  %xmm1, %xmm0
+; CHECK-NEXT: pandn %xmm2, %xmm1
+; CHECK-NEXT: por  %xmm1, %xmm0
+; CHECK-NEXT: retq
+  %1 = call <8 x i8> @llvm.sabsdiff.v8i8(<8 x i8> %a1, <8 x i8> %a2)
+  ret <8 x i8> %1
+}
+
+declare <16 x i8> @llvm.uabsdiff.v16i8(<16 x i8>, <16 x i8>)
+
+define <16 x i8> @test_uabsdiff_v16i8_expand(<16 x i8> %a1, <16 x i8> %a2) {
+; CHECK-LABEL: test_uabsdiff_v16i8_expand
+; CHECK:             psubb  %xmm1, %xmm0
+; CHECK-NEXT:        pxor   %xmm1, %xmm1
+; CHECK-NEXT:        psubb  %xmm0, %xmm1
+; CHECK-NEXT:        movdqa  .LCPI{{[0-9_]*}}
+; CHECK-NEXT:        movdqa  %xmm1, %xmm3
+; CHECK-NEXT:        pxor   %xmm2, %xmm3
+; CHECK-NEXT:        pcmpgtb        %xmm3, %xmm2
+; CHECK-NEXT:        pand    %xmm2, %xmm0
+; CHECK-NEXT:        pandn   %xmm1, %xmm2
+; CHECK-NEXT:        por     %xmm2, %xmm0
+; CHECK-NEXT:        retq
+  %1 = call <16 x i8> @llvm.uabsdiff.v16i8(<16 x i8> %a1, <16 x i8> %a2)
+  ret <16 x i8> %1
+}
+
+declare <8 x i16> @llvm.uabsdiff.v8i16(<8 x i16>, <8 x i16>)
+
+define <8 x i16> @test_uabsdiff_v8i16_expand(<8 x i16> %a1, <8 x i16> %a2) {
+; CHECK-LABEL: test_uabsdiff_v8i16_expand
+; CHECK:             psubw  %xmm1, %xmm0
+; CHECK-NEXT:        pxor   %xmm1, %xmm1
+; CHECK-NEXT:        psubw  %xmm0, %xmm1
+; CHECK-NEXT:        movdqa  .LCPI{{[0-9_]*}}
+; CHECK-NEXT:        movdqa  %xmm1, %xmm3
+; CHECK-NEXT:        pxor   %xmm2, %xmm3
+; CHECK-NEXT:        pcmpgtw        %xmm3, %xmm2
+; CHECK-NEXT:        pand    %xmm2, %xmm0
+; CHECK-NEXT:        pandn   %xmm1, %xmm2
+; CHECK-NEXT:        por     %xmm2, %xmm0
+; CHECK-NEXT:        retq
+  %1 = call <8 x i16> @llvm.uabsdiff.v8i16(<8 x i16> %a1, <8 x i16> %a2)
+  ret <8 x i16> %1
+}
+
+declare <8 x i16> @llvm.sabsdiff.v8i16(<8 x i16>, <8 x i16>)
+
+define <8 x i16> @test_sabsdiff_v8i16_expand(<8 x i16> %a1, <8 x i16> %a2) {
+; CHECK-LABEL: test_sabsdiff_v8i16_expand
+; CHECK:      psubw  %xmm1, %xmm0
+; CHECK-NEXT: pxor   %xmm1, %xmm1
+; CHECK-NEXT: pxor   %xmm2, %xmm2
+; CHECK-NEXT: psubw  %xmm0, %xmm2
+; CHECK-NEXT: pcmpgtw        %xmm2, %xmm1
+; CHECK-NEXT: pand  %xmm1, %xmm0
+; CHECK-NEXT: pandn %xmm2, %xmm1
+; CHECK-NEXT: por  %xmm1, %xmm0
+; CHECK-NEXT: retq
+  %1 = call <8 x i16> @llvm.sabsdiff.v8i16(<8 x i16> %a1, <8 x i16> %a2)
+  ret <8 x i16> %1
+}
+
+declare <4 x i32> @llvm.sabsdiff.v4i32(<4 x i32>, <4 x i32>)
+
+define <4 x i32> @test_sabsdiff_v4i32_expand(<4 x i32> %a1, <4 x i32> %a2) {
+; CHECK-LABEL: test_sabsdiff_v4i32_expand
+; CHECK:             psubd  %xmm1, %xmm0
+; CHECK-NEXT:        pxor  %xmm1, %xmm1
+; CHECK-NEXT:        pxor  %xmm2, %xmm2
+; CHECK-NEXT:        psubd  %xmm0, %xmm2
+; CHECK-NEXT:        pcmpgtd        %xmm2, %xmm1
+; CHECK-NEXT:        pand    %xmm1, %xmm0
+; CHECK-NEXT:        pandn   %xmm2, %xmm1
+; CHECK-NEXT:        por    %xmm1, %xmm0
+; CHECK-NEXT:        retq
+  %1 = call <4 x i32> @llvm.sabsdiff.v4i32(<4 x i32> %a1, <4 x i32> %a2)
+  ret <4 x i32> %1
+}
+
+declare <4 x i32> @llvm.uabsdiff.v4i32(<4 x i32>, <4 x i32>)
+
+define <4 x i32> @test_uabsdiff_v4i32_expand(<4 x i32> %a1, <4 x i32> %a2) {
+; CHECK-LABEL: test_uabsdiff_v4i32_expand
+; CHECK:             psubd  %xmm1, %xmm0
+; CHECK-NEXT:        pxor   %xmm1, %xmm1
+; CHECK-NEXT:        psubd  %xmm0, %xmm1
+; CHECK-NEXT:        movdqa  .LCPI{{[0-9_]*}}
+; CHECK-NEXT:        movdqa  %xmm1, %xmm3
+; CHECK-NEXT:        pxor   %xmm2, %xmm3
+; CHECK-NEXT:        pcmpgtd        %xmm3, %xmm2
+; CHECK-NEXT:        pand    %xmm2, %xmm0
+; CHECK-NEXT:        pandn   %xmm1, %xmm2
+; CHECK-NEXT:        por     %xmm2, %xmm0
+; CHECK-NEXT:        retq
+  %1 = call <4 x i32> @llvm.uabsdiff.v4i32(<4 x i32> %a1, <4 x i32> %a2)
+  ret <4 x i32> %1
+}
+
+declare <2 x i32> @llvm.sabsdiff.v2i32(<2 x i32>, <2 x i32>)
+
+define <2 x i32> @test_sabsdiff_v2i32_expand(<2 x i32> %a1, <2 x i32> %a2) {
+; CHECK-LABEL: test_sabsdiff_v2i32_expand
+; CHECK:        psubq   %xmm1, %xmm0
+; CHECK-NEXT:   pxor    %xmm1, %xmm1
+; CHECK-NEXT:   psubq   %xmm0, %xmm1
+; CHECK-NEXT:   movdqa  .LCPI{{[0-9_]*}}
+; CHECK-NEXT:   movdqa  %xmm1, %xmm3
+; CHECK-NEXT:   pxor    %xmm2, %xmm3
+; CHECK-NEXT:   movdqa  %xmm2, %xmm4
+; CHECK-NEXT:   pcmpgtd %xmm3, %xmm4
+; CHECK-NEXT:   pshufd  $160, %xmm4, %xmm5      # xmm5 = xmm4[0,0,2,2]
+; CHECK-NEXT:   pcmpeqd %xmm2, %xmm3
+; CHECK-NEXT:   pshufd  $245, %xmm3, %xmm2      # xmm2 = xmm3[1,1,3,3]
+; CHECK-NEXT:   pand    %xmm5, %xmm2
+; CHECK-NEXT:   pshufd  $245, %xmm4, %xmm3      # xmm3 = xmm4[1,1,3,3]
+; CHECK-NEXT:   por     %xmm2, %xmm3
+; CHECK-NEXT:   pand    %xmm3, %xmm0
+; CHECK-NEXT:   pandn   %xmm1, %xmm3
+; CHECK-NEXT:   por     %xmm3, %xmm0
+; CHECK-NEXT:   retq
+  %1 = call <2 x i32> @llvm.sabsdiff.v2i32(<2 x i32> %a1, <2 x i32> %a2)
+  ret <2 x i32> %1
+}
+
+declare <2 x i64> @llvm.sabsdiff.v2i64(<2 x i64>, <2 x i64>)
+
+define <2 x i64> @test_sabsdiff_v2i64_expand(<2 x i64> %a1, <2 x i64> %a2) {
+; CHECK-LABEL: test_sabsdiff_v2i64_expand
+; CHECK:        psubq   %xmm1, %xmm0
+; CHECK-NEXT:   pxor    %xmm1, %xmm1
+; CHECK-NEXT:   psubq   %xmm0, %xmm1
+; CHECK-NEXT:   movdqa  .LCPI{{[0-9_]*}}
+; CHECK-NEXT:   movdqa  %xmm1, %xmm3
+; CHECK-NEXT:   pxor    %xmm2, %xmm3
+; CHECK-NEXT:   movdqa  %xmm2, %xmm4
+; CHECK-NEXT:   pcmpgtd %xmm3, %xmm4
+; CHECK-NEXT:   pshufd  $160, %xmm4, %xmm5      # xmm5 = xmm4[0,0,2,2]
+; CHECK-NEXT:   pcmpeqd %xmm2, %xmm3
+; CHECK-NEXT:   pshufd  $245, %xmm3, %xmm2      # xmm2 = xmm3[1,1,3,3]
+; CHECK-NEXT:   pand    %xmm5, %xmm2
+; CHECK-NEXT:   pshufd  $245, %xmm4, %xmm3      # xmm3 = xmm4[1,1,3,3]
+; CHECK-NEXT:   por     %xmm2, %xmm3
+; CHECK-NEXT:   pand    %xmm3, %xmm0
+; CHECK-NEXT:   pandn   %xmm1, %xmm3
+; CHECK-NEXT:   por     %xmm3, %xmm0
+; CHECK-NEXT:   retq
+  %1 = call <2 x i64> @llvm.sabsdiff.v2i64(<2 x i64> %a1, <2 x i64> %a2)
+  ret <2 x i64> %1
+}
+
+declare <16 x i32> @llvm.sabsdiff.v16i32(<16 x i32>, <16 x i32>)
+
+define <16 x i32> @test_sabsdiff_v16i32_expand(<16 x i32> %a1, <16 x i32> %a2) {
+; CHECK-LABEL: test_sabsdiff_v16i32_expand
+; CHECK:             psubd  %xmm4, %xmm0
+; CHECK-NEXT:        pxor    %xmm8, %xmm8
+; CHECK-NEXT:        pxor    %xmm9, %xmm9
+; CHECK-NEXT:        psubd   %xmm0, %xmm9
+; CHECK-NEXT:        pxor    %xmm4, %xmm4
+; CHECK-NEXT:        pcmpgtd %xmm9, %xmm4
+; CHECK-NEXT:        pand    %xmm4, %xmm0
+; CHECK-NEXT:        pandn   %xmm9, %xmm4
+; CHECK-NEXT:        por     %xmm4, %xmm0
+; CHECK-NEXT:        psubd   %xmm5, %xmm1
+; CHECK-NEXT:        pxor    %xmm4, %xmm4
+; CHECK-NEXT:        psubd   %xmm1, %xmm4
+; CHECK-NEXT:        pxor    %xmm5, %xmm5
+; CHECK-NEXT:        pcmpgtd %xmm4, %xmm5
+; CHECK-NEXT:        pand    %xmm5, %xmm1
+; CHECK-NEXT:        pandn   %xmm4, %xmm5
+; CHECK-NEXT:        por     %xmm5, %xmm1
+; CHECK-NEXT:        psubd   %xmm6, %xmm2
+; CHECK-NEXT:        pxor    %xmm4, %xmm4
+; CHECK-NEXT:        psubd   %xmm2, %xmm4
+; CHECK-NEXT:        pxor    %xmm5, %xmm5
+; CHECK-NEXT:        pcmpgtd %xmm4, %xmm5
+; CHECK-NEXT:        pand    %xmm5, %xmm2
+; CHECK-NEXT:        pandn   %xmm4, %xmm5
+; CHECK-NEXT:        por     %xmm5, %xmm2
+; CHECK-NEXT:        psubd   %xmm7, %xmm3
+; CHECK-NEXT:        pxor    %xmm4, %xmm4
+; CHECK-NEXT:        psubd   %xmm3, %xmm4
+; CHECK-NEXT:        pcmpgtd %xmm4, %xmm8
+; CHECK-NEXT:        pand    %xmm8, %xmm3
+; CHECK-NEXT:        pandn   %xmm4, %xmm8
+; CHECK-NEXT:        por     %xmm8, %xmm3
+; CHECK-NEXT:        req
The tests look very fragile, should we make them more relaxed in terms of register names?
+  %1 = call <16 x i32> @llvm.sabsdiff.v16i32(<16 x i32> %a1, <16 x i32> %a2)
+  ret <16 x i32> %1
+}
+


_______________________________________________
llvm-commits mailing list
llvm-commits at cs.uiuc.edu<mailto:llvm-commits at cs.uiuc.edu>
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits


-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2557590
ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2548782
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150720/3c5d603a/attachment.html>