[llvm] r242409 - [Codegen] Add intrinsics 'absdiff' and corresponding SDNodes for absolute difference operation

Wed Jul 29 16:48:33 PDT 2015

Hi Shahid,

Sorry for the delayed response, please see my answers inline:

> On Jul 22, 2015, at 1:54 AM, Shahid, Asghar-ahmad <Asghar-ahmad.Shahid at amd.com> wrote:
> 
> Hi Mikhail,
> 
> Thanks for the comments. Pls see the response inlined.
> 
> Regards,
> Shahid
> 
>> -----Original Message-----
>> From: llvm-commits-bounces at cs.uiuc.edu <mailto:llvm-commits-bounces at cs.uiuc.edu> [mailto:llvm-commits-
>> bounces at cs.uiuc.edu <mailto:bounces at cs.uiuc.edu>] On Behalf Of Mikhail Zolotukhin
>> Sent: Tuesday, July 21, 2015 12:02 AM
>> To: James Molloy
>> Cc: llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
>> Subject: Re: [llvm] r242409 - [Codegen] Add intrinsics 'absdiff' and
>> corresponding SDNodes for absolute difference operation
>> 
>> 
>>> On Jul 16, 2015, at 8:22 AM, James Molloy <James.Molloy at arm.com>
>> wrote:
>>> 
>>> Author: jamesm
>>> Date: Thu Jul 16 10:22:46 2015
>>> New Revision: 242409
>>> 
>>> URL: http://llvm.org/viewvc/llvm-project?rev=242409&view=rev
>>> Log:
>>> [Codegen] Add intrinsics 'absdiff' and corresponding SDNodes for
>>> absolute difference operation
>>> 
>>> This adds new intrinsics "*absdiff" for absolute difference ops to facilitate
>> efficient code generation for "sum of absolute differences" operation.
>>> The patch also contains the introduction of corresponding SDNodes and
>> basic legalization support.Sanity of the generated code is tested on X86.
>>> 
>>> This is 1st of the three patches.
>>> 
>>> Patch by Shahid Asghar-ahmad!
>>> 
>>> Added:
>>>   llvm/trunk/test/CodeGen/X86/absdiff_expand.ll
>>> Modified:
>>>   llvm/trunk/docs/LangRef.rst
>>>   llvm/trunk/include/llvm/CodeGen/ISDOpcodes.h
>>>   llvm/trunk/include/llvm/IR/Intrinsics.td
>>>   llvm/trunk/include/llvm/Target/TargetSelectionDAG.td
>>>   llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
>>>   llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
>>>   llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
>>>   llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
>>>   llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
>>>   llvm/trunk/lib/CodeGen/TargetLoweringBase.cpp
>>> 
>>> Modified: llvm/trunk/docs/LangRef.rst
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/LangRef.rst?rev=24
>>> 2409&r1=242408&r2=242409&view=diff
>>> 
>> ==========================================================
>> ============
>>> ========
>>> --- llvm/trunk/docs/LangRef.rst (original)
>>> +++ llvm/trunk/docs/LangRef.rst Thu Jul 16 10:22:46 2015
>>> @@ -10328,6 +10328,65 @@ Examples:
>>> 
>>>      %r2 = call float @llvm.fmuladd.f32(float %a, float %b, float %c)
>>> ; yields float:r2 = (a * b) + c
>>> 
>>> +
>>> +'``llvm.uabsdiff.*``' and '``llvm.sabsdiff.*``' Intrinsics
>>> 
>> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> ^
>>> +
>>> +Syntax:
>>> +"""""""
>>> +This is an overloaded intrinsic. The loaded data is a vector of any integer
>> bit width.
>>> +
>>> +.. code-block:: llvm
>>> +
>>> +      declare <4 x integer> @llvm.uabsdiff.v4i32(<4 x integer> %a, <4
>>> + x integer> %b)
>>> +
>>> +
>>> +Overview:
>>> +"""""""""
>>> +
>>> +The ``llvm.uabsdiff`` intrinsic returns a vector result of the
>>> +absolute difference of the two operands, treating them both as unsigned
>> integers.
>>> +
>>> +The ``llvm.sabsdiff`` intrinsic returns  a vector result of the
>>> +absolute difference of the two operands, treating them both as signed
>> integers.
>>> +
>>> +.. note::
>>> +
>>> +    These intrinsics are primarily used during the code generation stage of
>> compilation.
>>> +    They are generated by compiler passes such as the Loop and SLP
>> vectorizers.it is not
>>> +    recommended for users to create them manually.
>>> +
>>> +Arguments:
>>> +""""""""""
>>> +
>>> +Both intrinsics take two integer of the same bitwidth.
>>> +
>>> +Semantics:
>>> +""""""""""
>>> +
>>> +The expression::
>>> +
>>> +    call <4 x i32> @llvm.uabsdiff.v4i32(<4 x i32> %a, <4 x i32> %b)
>>> +
>>> +is equivalent to::
>>> +
>>> +    %sub = sub <4 x i32> %a, %b
>>> +    %ispos = icmp ugt <4 x i32> %sub, <i32 -1, i32 -1, i32 -1, i32
>>> + -1>
>> Isn't it always 'false'?
> Oh yes, this will always be false as the comparison is 'unsigned'.
> Since the subtraction of two unsigned numbers can be a signed number, How about making this comparison as
> "icmp sge <4 x i32> %sub, zeroinitializer”
Sounds good to me.
> 
>>> +    %neg = sub <4 x i32> zeroinitializer, %sub
>>> +    %1 = select <4 x i1> %ispos, <4 x i32> %sub, <4 x i32> %neg
>>> +
>>> +Similarly the expression::
>>> +
>>> +    call <4 x i32> @llvm.sabsdiff.v4i32(<4 x i32> %a, <4 x i32> %b)
>>> +
>>> +is equivalent to::
>>> +
>>> +    %sub = sub nsw <4 x i32> %a, %b
>>> +    %ispos = icmp sgt <4 x i32> %sub, <i32 -1, i32 -1, i32 -1, i32
>>> + -1>
>> Wouldn't it be more readable if we use "icmp sge <4 x i32> %sub,
>> zeroinitializer"?
> Yes, that will do.
> 
>>> +    %neg = sub nsw <4 x i32> zeroinitializer, %sub
>>> +    %1 = select <4 x i1> %ispos, <4 x i32> %sub, <4 x i32> %neg
>>> +
>>> +
>>> Half Precision Floating Point Intrinsics
>>> ----------------------------------------
>>> 
>>> 
>>> Modified: llvm/trunk/include/llvm/CodeGen/ISDOpcodes.h
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/CodeGen/IS
>>> DOpcodes.h?rev=242409&r1=242408&r2=242409&view=diff
>>> 
>> ==========================================================
>> ============
>>> ========
>>> --- llvm/trunk/include/llvm/CodeGen/ISDOpcodes.h (original)
>>> +++ llvm/trunk/include/llvm/CodeGen/ISDOpcodes.h Thu Jul 16 10:22:46
>>> +++ 2015
>>> @@ -334,6 +334,10 @@ namespace ISD {
>>>    /// Byte Swap and Counting operators.
>>>    BSWAP, CTTZ, CTLZ, CTPOP,
>>> 
>>> +    /// [SU]ABSDIFF - Signed/Unsigned absolute difference of two input
>> integer
>>> +    /// vector. These nodes are generated from llvm.*absdiff* intrinsics.
>>> +    SABSDIFF, UABSDIFF,
>>> +
>>>    /// Bit counting operators with an undefined result for zero inputs.
>>>    CTTZ_ZERO_UNDEF, CTLZ_ZERO_UNDEF,
>>> 
>>> 
>>> Modified: llvm/trunk/include/llvm/IR/Intrinsics.td
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/IR/Intrins
>>> ics.td?rev=242409&r1=242408&r2=242409&view=diff
>>> 
>> ==========================================================
>> ============
>>> ========
>>> --- llvm/trunk/include/llvm/IR/Intrinsics.td (original)
>>> +++ llvm/trunk/include/llvm/IR/Intrinsics.td Thu Jul 16 10:22:46 2015
>>> @@ -605,6 +605,12 @@ def int_convertuu  : Intrinsic<[llvm_any def
>>> int_clear_cache : Intrinsic<[], [llvm_ptr_ty, llvm_ptr_ty],
>>>                                [], "llvm.clear_cache">;
>>> 
>>> +// Calculate the Absolute Differences of the two input vectors.
>>> +def int_sabsdiff : Intrinsic<[llvm_anyvector_ty],
>>> +                        [ LLVMMatchType<0>, LLVMMatchType<0> ],
>>> +[IntrNoMem]>; def int_uabsdiff : Intrinsic<[llvm_anyvector_ty],
>>> +                        [ LLVMMatchType<0>, LLVMMatchType<0> ],
>>> +[IntrNoMem]>;
>>> +
>>> //===-------------------------- Masked Intrinsics
>>> -------------------------===// // def int_masked_store : Intrinsic<[],
>>> [llvm_anyvector_ty, LLVMPointerTo<0>,
>>> 
>>> Modified: llvm/trunk/include/llvm/Target/TargetSelectionDAG.td
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Target/Tar
>>> getSelectionDAG.td?rev=242409&r1=242408&r2=242409&view=diff
>>> 
>> ==========================================================
>> ============
>>> ========
>>> --- llvm/trunk/include/llvm/Target/TargetSelectionDAG.td (original)
>>> +++ llvm/trunk/include/llvm/Target/TargetSelectionDAG.td Thu Jul 16
>>> +++ 10:22:46 2015
>>> @@ -386,6 +386,8 @@ def smax       : SDNode<"ISD::SMAX"
>>> def umin       : SDNode<"ISD::UMIN"      , SDTIntBinOp>;
>>> def umax       : SDNode<"ISD::UMAX"      , SDTIntBinOp>;
>>> 
>>> +def sabsdiff   : SDNode<"ISD::SABSDIFF"   , SDTIntBinOp>;
>>> +def uabsdiff   : SDNode<"ISD::UABSDIFF"   , SDTIntBinOp>;
>>> def sext_inreg : SDNode<"ISD::SIGN_EXTEND_INREG", SDTExtInreg>;
>>> def bswap      : SDNode<"ISD::BSWAP"      , SDTIntUnaryOp>;
>>> def ctlz       : SDNode<"ISD::CTLZ"       , SDTIntUnaryOp>;
>>> 
>>> Modified:
>> llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDA
>>> 
>> G/LegalizeIntegerTypes.cpp?rev=242409&r1=242408&r2=242409&view=diff
>>> 
>> ==========================================================
>> ============
>>> ========
>>> --- llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
>>> (original)
>>> +++ llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp Thu
>>> +++ Jul 16 10:22:46 2015
>>> @@ -146,6 +146,10 @@ void DAGTypeLegalizer::PromoteIntegerRes
>>>  case ISD::ATOMIC_CMP_SWAP_WITH_SUCCESS:
>>>    Res = PromoteIntRes_AtomicCmpSwap(cast<AtomicSDNode>(N),
>> ResNo);
>>>    break;
>>> +  case ISD::UABSDIFF:
>>> +  case ISD::SABSDIFF:
>>> +    Res = PromoteIntRes_SimpleIntBinOp(N);
>>> +    break;
>>>  }
>>> 
>>>  // If the result is null then the sub-method took care of registering it.
>>> 
>>> Modified: llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDA
>>> G/LegalizeVectorOps.cpp?rev=242409&r1=242408&r2=242409&view=diff
>>> 
>> ==========================================================
>> ============
>>> ========
>>> --- llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
>>> (original)
>>> +++ llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp Thu Jul
>>> +++ 16 10:22:46 2015
>>> @@ -105,6 +105,7 @@ class VectorLegalizer {
>>>  SDValue ExpandLoad(SDValue Op);
>>>  SDValue ExpandStore(SDValue Op);
>>>  SDValue ExpandFNEG(SDValue Op);
>>> +  SDValue ExpandABSDIFF(SDValue Op);
>>> 
>>>  /// \brief Implements vector promotion.
>>>  ///
>>> @@ -326,6 +327,8 @@ SDValue VectorLegalizer::LegalizeOp(SDVa
>>>  case ISD::SMAX:
>>>  case ISD::UMIN:
>>>  case ISD::UMAX:
>>> +  case ISD::UABSDIFF:
>>> +  case ISD::SABSDIFF:
>>>    QueryType = Node->getValueType(0);
>>>    break;
>>>  case ISD::FP_ROUND_INREG:
>>> @@ -708,11 +711,36 @@ SDValue VectorLegalizer::Expand(SDValue
>>>    return ExpandFNEG(Op);
>>>  case ISD::SETCC:
>>>    return UnrollVSETCC(Op);
>>> +  case ISD::UABSDIFF:
>>> +  case ISD::SABSDIFF:
>>> +    return ExpandABSDIFF(Op);
>>>  default:
>>>    return DAG.UnrollVectorOp(Op.getNode());
>>>  }
>>> }
>>> 
>>> +SDValue VectorLegalizer::ExpandABSDIFF(SDValue Op) {
>>> +  SDLoc dl(Op);
>>> +  SDValue Tmp1, Tmp2, Tmp3, Tmp4;
>>> +  EVT VT = Op.getValueType();
>>> +  SDNodeFlags Flags;
>>> +  Flags.setNoSignedWrap(Op->getOpcode() == ISD::SABSDIFF);
>>> +
>>> +  Tmp2 = Op.getOperand(0);
>>> +  Tmp3 = Op.getOperand(1);
>>> +  Tmp1 = DAG.getNode(ISD::SUB, dl, VT, Tmp2, Tmp3, &Flags);
>>> +  Tmp2 =
>>> +      DAG.getNode(ISD::SUB, dl, VT, DAG.getConstant(0, dl, VT), Tmp1,
>>> +&Flags);
>>> +  Tmp4 = DAG.getNode(
>>> +      ISD::SETCC, dl,
>>> +      TLI.getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(),
>> VT), Tmp2,
>>> +      DAG.getConstant(0, dl, VT),
>>> +      DAG.getCondCode(Op->getOpcode() == ISD::SABSDIFF ? ISD::SETLT
>>> +                                                       :
>>> +ISD::SETULT));
>>> +  Tmp1 = DAG.getNode(ISD::VSELECT, dl, VT, Tmp4, Tmp1, Tmp2);
>>> +  return Tmp1;
>>> +}
>>> +
>>> SDValue VectorLegalizer::ExpandSELECT(SDValue Op) {
>>>  // Lower a select instruction where the condition is a scalar and the
>>>  // operands are vectors. Lower this select to VSELECT and implement
>>> it
>>> 
>>> Modified: llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDA
>>> G/LegalizeVectorTypes.cpp?rev=242409&r1=242408&r2=242409&view=diff
>>> 
>> ==========================================================
>> ============
>>> ========
>>> --- llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
>>> (original)
>>> +++ llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp Thu
>>> +++ Jul 16 10:22:46 2015
>>> @@ -678,6 +678,8 @@ void DAGTypeLegalizer::SplitVectorResult
>>>  case ISD::SMAX:
>>>  case ISD::UMIN:
>>>  case ISD::UMAX:
>>> +  case ISD::UABSDIFF:
>>> +  case ISD::SABSDIFF:
>>>    SplitVecRes_BinOp(N, Lo, Hi);
>>>    break;
>>>  case ISD::FMA:
>>> 
>>> Modified:
>> llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDA
>>> G/SelectionDAGBuilder.cpp?rev=242409&r1=242408&r2=242409&view=diff
>>> 
>> ==========================================================
>> ============
>>> ========
>>> --- llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
>>> (original)
>>> +++ llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp Thu
>>> +++ Jul 16 10:22:46 2015
>>> @@ -4646,6 +4646,18 @@ SelectionDAGBuilder::visitIntrinsicCall(
>>>                             getValue(I.getArgOperand(0)).getValueType(),
>>>                             getValue(I.getArgOperand(0))));
>>>    return nullptr;
>>> +  case Intrinsic::uabsdiff:
>>> +    setValue(&I, DAG.getNode(ISD::UABSDIFF, sdl,
>>> +                             getValue(I.getArgOperand(0)).getValueType(),
>>> +                             getValue(I.getArgOperand(0)),
>>> +                             getValue(I.getArgOperand(1))));
>>> +    return nullptr;
>>> +  case Intrinsic::sabsdiff:
>>> +    setValue(&I, DAG.getNode(ISD::SABSDIFF, sdl,
>>> +                             getValue(I.getArgOperand(0)).getValueType(),
>>> +                             getValue(I.getArgOperand(0)),
>>> +                             getValue(I.getArgOperand(1))));
>>> +    return nullptr;
>>>  case Intrinsic::cttz: {
>>>    SDValue Arg = getValue(I.getArgOperand(0));
>>>    ConstantInt *CI = cast<ConstantInt>(I.getArgOperand(1));
>>> 
>>> Modified:
>> llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDA
>>> 
>> G/SelectionDAGDumper.cpp?rev=242409&r1=242408&r2=242409&view=diff
>>> 
>> ==========================================================
>> ============
>>> ========
>>> --- llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
>>> (original)
>>> +++ llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp Thu
>> Jul
>>> +++ 16 10:22:46 2015
>>> @@ -225,6 +225,8 @@ std::string SDNode::getOperationName(con
>>>  case ISD::SHL_PARTS:                  return "shl_parts";
>>>  case ISD::SRA_PARTS:                  return "sra_parts";
>>>  case ISD::SRL_PARTS:                  return "srl_parts";
>>> +  case ISD::UABSDIFF:                   return "uabsdiff";
>>> +  case ISD::SABSDIFF:                   return "sabsdiff";
>>> 
>>>  // Conversion operators.
>>>  case ISD::SIGN_EXTEND:                return "sign_extend";
>>> 
>>> Modified: llvm/trunk/lib/CodeGen/TargetLoweringBase.cpp
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/TargetLower
>>> ingBase.cpp?rev=242409&r1=242408&r2=242409&view=diff
>>> 
>> ==========================================================
>> ============
>>> ========
>>> --- llvm/trunk/lib/CodeGen/TargetLoweringBase.cpp (original)
>>> +++ llvm/trunk/lib/CodeGen/TargetLoweringBase.cpp Thu Jul 16 10:22:46
>>> +++ 2015
>>> @@ -827,6 +827,8 @@ void TargetLoweringBase::initActions() {
>>>    setOperationAction(ISD::USUBO, VT, Expand);
>>>    setOperationAction(ISD::SMULO, VT, Expand);
>>>    setOperationAction(ISD::UMULO, VT, Expand);
>>> +    setOperationAction(ISD::UABSDIFF, VT, Expand);
>>> +    setOperationAction(ISD::SABSDIFF, VT, Expand);
>>> 
>>>    // These library functions default to expand.
>>>    setOperationAction(ISD::FROUND, VT, Expand);
>>> 
>>> Added: llvm/trunk/test/CodeGen/X86/absdiff_expand.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/absdif
>>> f_expand.ll?rev=242409&view=auto
>>> 
>> ==========================================================
>> ============
>>> ========
>>> --- llvm/trunk/test/CodeGen/X86/absdiff_expand.ll (added)
>>> +++ llvm/trunk/test/CodeGen/X86/absdiff_expand.ll Thu Jul 16 10:22:46
>>> +++ 2015
>>> @@ -0,0 +1,242 @@
>>> +; RUN: llc -mtriple=x86_64-unknown-linux-gnu  < %s | FileCheck %s
>>> +-check-prefix=CHECK
>>> +
>>> +declare <4 x i8> @llvm.uabsdiff.v4i8(<4 x i8>, <4 x i8>)
>>> +
>>> +define <4 x i8> @test_uabsdiff_v4i8_expand(<4 x i8> %a1, <4 x i8>
>>> +%a2) { ; CHECK-LABEL: test_uabsdiff_v4i8_expand
>>> +; CHECK:             psubd  %xmm1, %xmm0
>>> +; CHECK-NEXT:        pxor   %xmm1, %xmm1
>>> +; CHECK-NEXT:        psubd  %xmm0, %xmm1
>>> +; CHECK-NEXT:        movdqa  .LCPI{{[0-9_]*}}
>>> +; CHECK-NEXT:        movdqa  %xmm1, %xmm3
>>> +; CHECK-NEXT:        pxor   %xmm2, %xmm3
>>> +; CHECK-NEXT:        pcmpgtd        %xmm3, %xmm2
>>> +; CHECK-NEXT:        pand    %xmm2, %xmm0
>>> +; CHECK-NEXT:        pandn   %xmm1, %xmm2
>>> +; CHECK-NEXT:        por     %xmm2, %xmm0
>>> +; CHECK-NEXT:        retq
>>> +
>>> +  %1 = call <4 x i8> @llvm.uabsdiff.v4i8(<4 x i8> %a1, <4 x i8> %a2)
>>> +  ret <4 x i8> %1
>>> +}
>>> +
>>> +declare <4 x i8> @llvm.sabsdiff.v4i8(<4 x i8>, <4 x i8>)
>>> +
>>> +define <4 x i8> @test_sabsdiff_v4i8_expand(<4 x i8> %a1, <4 x i8>
>>> +%a2) { ; CHECK-LABEL: test_sabsdiff_v4i8_expand
>>> +; CHECK:      psubd  %xmm1, %xmm0
>>> +; CHECK-NEXT: pxor   %xmm1, %xmm1
>>> +; CHECK-NEXT: pxor    %xmm2, %xmm2
>>> +; CHECK-NEXT: psubd  %xmm0, %xmm2
>>> +; CHECK-NEXT: pcmpgtd  %xmm2, %xmm1
>>> +; CHECK-NEXT: pand    %xmm1, %xmm0
>>> +; CHECK-NEXT: pandn   %xmm2, %xmm1
>>> +; CHECK-NEXT: por     %xmm1, %xmm0
>>> +; CHECK-NEXT: retq
>>> +
>>> +  %1 = call <4 x i8> @llvm.sabsdiff.v4i8(<4 x i8> %a1, <4 x i8> %a2)
>>> +  ret <4 x i8> %1
>>> +}
>>> +
>>> +
>>> +declare <8 x i8> @llvm.sabsdiff.v8i8(<8 x i8>, <8 x i8>)
>>> +
>>> +define <8 x i8> @test_sabsdiff_v8i8_expand(<8 x i8> %a1, <8 x i8>
>>> +%a2) { ; CHECK-LABEL: test_sabsdiff_v8i8_expand
>>> +; CHECK:      psubw  %xmm1, %xmm0
>>> +; CHECK-NEXT: pxor   %xmm1, %xmm1
>>> +; CHECK-NEXT: pxor   %xmm2, %xmm2
>>> +; CHECK-NEXT: psubw  %xmm0, %xmm2
>>> +; CHECK-NEXT: pcmpgtw        %xmm2, %xmm1
>>> +; CHECK-NEXT: pand  %xmm1, %xmm0
>>> +; CHECK-NEXT: pandn %xmm2, %xmm1
>>> +; CHECK-NEXT: por  %xmm1, %xmm0
>>> +; CHECK-NEXT: retq
>>> +  %1 = call <8 x i8> @llvm.sabsdiff.v8i8(<8 x i8> %a1, <8 x i8> %a2)
>>> +  ret <8 x i8> %1
>>> +}
>>> +
>>> +declare <16 x i8> @llvm.uabsdiff.v16i8(<16 x i8>, <16 x i8>)
>>> +
>>> +define <16 x i8> @test_uabsdiff_v16i8_expand(<16 x i8> %a1, <16 x i8>
>>> +%a2) { ; CHECK-LABEL: test_uabsdiff_v16i8_expand
>>> +; CHECK:             psubb  %xmm1, %xmm0
>>> +; CHECK-NEXT:        pxor   %xmm1, %xmm1
>>> +; CHECK-NEXT:        psubb  %xmm0, %xmm1
>>> +; CHECK-NEXT:        movdqa  .LCPI{{[0-9_]*}}
>>> +; CHECK-NEXT:        movdqa  %xmm1, %xmm3
>>> +; CHECK-NEXT:        pxor   %xmm2, %xmm3
>>> +; CHECK-NEXT:        pcmpgtb        %xmm3, %xmm2
>>> +; CHECK-NEXT:        pand    %xmm2, %xmm0
>>> +; CHECK-NEXT:        pandn   %xmm1, %xmm2
>>> +; CHECK-NEXT:        por     %xmm2, %xmm0
>>> +; CHECK-NEXT:        retq
>>> +  %1 = call <16 x i8> @llvm.uabsdiff.v16i8(<16 x i8> %a1, <16 x i8>
>>> +%a2)
>>> +  ret <16 x i8> %1
>>> +}
>>> +
>>> +declare <8 x i16> @llvm.uabsdiff.v8i16(<8 x i16>, <8 x i16>)
>>> +
>>> +define <8 x i16> @test_uabsdiff_v8i16_expand(<8 x i16> %a1, <8 x i16>
>>> +%a2) { ; CHECK-LABEL: test_uabsdiff_v8i16_expand
>>> +; CHECK:             psubw  %xmm1, %xmm0
>>> +; CHECK-NEXT:        pxor   %xmm1, %xmm1
>>> +; CHECK-NEXT:        psubw  %xmm0, %xmm1
>>> +; CHECK-NEXT:        movdqa  .LCPI{{[0-9_]*}}
>>> +; CHECK-NEXT:        movdqa  %xmm1, %xmm3
>>> +; CHECK-NEXT:        pxor   %xmm2, %xmm3
>>> +; CHECK-NEXT:        pcmpgtw        %xmm3, %xmm2
>>> +; CHECK-NEXT:        pand    %xmm2, %xmm0
>>> +; CHECK-NEXT:        pandn   %xmm1, %xmm2
>>> +; CHECK-NEXT:        por     %xmm2, %xmm0
>>> +; CHECK-NEXT:        retq
>>> +  %1 = call <8 x i16> @llvm.uabsdiff.v8i16(<8 x i16> %a1, <8 x i16>
>>> +%a2)
>>> +  ret <8 x i16> %1
>>> +}
>>> +
>>> +declare <8 x i16> @llvm.sabsdiff.v8i16(<8 x i16>, <8 x i16>)
>>> +
>>> +define <8 x i16> @test_sabsdiff_v8i16_expand(<8 x i16> %a1, <8 x i16>
>>> +%a2) { ; CHECK-LABEL: test_sabsdiff_v8i16_expand
>>> +; CHECK:      psubw  %xmm1, %xmm0
>>> +; CHECK-NEXT: pxor   %xmm1, %xmm1
>>> +; CHECK-NEXT: pxor   %xmm2, %xmm2
>>> +; CHECK-NEXT: psubw  %xmm0, %xmm2
>>> +; CHECK-NEXT: pcmpgtw        %xmm2, %xmm1
>>> +; CHECK-NEXT: pand  %xmm1, %xmm0
>>> +; CHECK-NEXT: pandn %xmm2, %xmm1
>>> +; CHECK-NEXT: por  %xmm1, %xmm0
>>> +; CHECK-NEXT: retq
>>> +  %1 = call <8 x i16> @llvm.sabsdiff.v8i16(<8 x i16> %a1, <8 x i16>
>>> +%a2)
>>> +  ret <8 x i16> %1
>>> +}
>>> +
>>> +declare <4 x i32> @llvm.sabsdiff.v4i32(<4 x i32>, <4 x i32>)
>>> +
>>> +define <4 x i32> @test_sabsdiff_v4i32_expand(<4 x i32> %a1, <4 x i32>
>>> +%a2) { ; CHECK-LABEL: test_sabsdiff_v4i32_expand
>>> +; CHECK:             psubd  %xmm1, %xmm0
>>> +; CHECK-NEXT:        pxor  %xmm1, %xmm1
>>> +; CHECK-NEXT:        pxor  %xmm2, %xmm2
>>> +; CHECK-NEXT:        psubd  %xmm0, %xmm2
>>> +; CHECK-NEXT:        pcmpgtd        %xmm2, %xmm1
>>> +; CHECK-NEXT:        pand    %xmm1, %xmm0
>>> +; CHECK-NEXT:        pandn   %xmm2, %xmm1
>>> +; CHECK-NEXT:        por    %xmm1, %xmm0
>>> +; CHECK-NEXT:        retq
>>> +  %1 = call <4 x i32> @llvm.sabsdiff.v4i32(<4 x i32> %a1, <4 x i32>
>>> +%a2)
>>> +  ret <4 x i32> %1
>>> +}
>>> +
>>> +declare <4 x i32> @llvm.uabsdiff.v4i32(<4 x i32>, <4 x i32>)
>>> +
>>> +define <4 x i32> @test_uabsdiff_v4i32_expand(<4 x i32> %a1, <4 x i32>
>>> +%a2) { ; CHECK-LABEL: test_uabsdiff_v4i32_expand
>>> +; CHECK:             psubd  %xmm1, %xmm0
>>> +; CHECK-NEXT:        pxor   %xmm1, %xmm1
>>> +; CHECK-NEXT:        psubd  %xmm0, %xmm1
>>> +; CHECK-NEXT:        movdqa  .LCPI{{[0-9_]*}}
>>> +; CHECK-NEXT:        movdqa  %xmm1, %xmm3
>>> +; CHECK-NEXT:        pxor   %xmm2, %xmm3
>>> +; CHECK-NEXT:        pcmpgtd        %xmm3, %xmm2
>>> +; CHECK-NEXT:        pand    %xmm2, %xmm0
>>> +; CHECK-NEXT:        pandn   %xmm1, %xmm2
>>> +; CHECK-NEXT:        por     %xmm2, %xmm0
>>> +; CHECK-NEXT:        retq
>>> +  %1 = call <4 x i32> @llvm.uabsdiff.v4i32(<4 x i32> %a1, <4 x i32>
>>> +%a2)
>>> +  ret <4 x i32> %1
>>> +}
>>> +
>>> +declare <2 x i32> @llvm.sabsdiff.v2i32(<2 x i32>, <2 x i32>)
>>> +
>>> +define <2 x i32> @test_sabsdiff_v2i32_expand(<2 x i32> %a1, <2 x i32>
>>> +%a2) { ; CHECK-LABEL: test_sabsdiff_v2i32_expand
>>> +; CHECK:        psubq   %xmm1, %xmm0
>>> +; CHECK-NEXT:   pxor    %xmm1, %xmm1
>>> +; CHECK-NEXT:   psubq   %xmm0, %xmm1
>>> +; CHECK-NEXT:   movdqa  .LCPI{{[0-9_]*}}
>>> +; CHECK-NEXT:   movdqa  %xmm1, %xmm3
>>> +; CHECK-NEXT:   pxor    %xmm2, %xmm3
>>> +; CHECK-NEXT:   movdqa  %xmm2, %xmm4
>>> +; CHECK-NEXT:   pcmpgtd %xmm3, %xmm4
>>> +; CHECK-NEXT:   pshufd  $160, %xmm4, %xmm5      # xmm5 =
>> xmm4[0,0,2,2]
>>> +; CHECK-NEXT:   pcmpeqd %xmm2, %xmm3
>>> +; CHECK-NEXT:   pshufd  $245, %xmm3, %xmm2      # xmm2 =
>> xmm3[1,1,3,3]
>>> +; CHECK-NEXT:   pand    %xmm5, %xmm2
>>> +; CHECK-NEXT:   pshufd  $245, %xmm4, %xmm3      # xmm3 =
>> xmm4[1,1,3,3]
>>> +; CHECK-NEXT:   por     %xmm2, %xmm3
>>> +; CHECK-NEXT:   pand    %xmm3, %xmm0
>>> +; CHECK-NEXT:   pandn   %xmm1, %xmm3
>>> +; CHECK-NEXT:   por     %xmm3, %xmm0
>>> +; CHECK-NEXT:   retq
>>> +  %1 = call <2 x i32> @llvm.sabsdiff.v2i32(<2 x i32> %a1, <2 x i32>
>>> +%a2)
>>> +  ret <2 x i32> %1
>>> +}
>>> +
>>> +declare <2 x i64> @llvm.sabsdiff.v2i64(<2 x i64>, <2 x i64>)
>>> +
>>> +define <2 x i64> @test_sabsdiff_v2i64_expand(<2 x i64> %a1, <2 x i64>
>>> +%a2) { ; CHECK-LABEL: test_sabsdiff_v2i64_expand
>>> +; CHECK:        psubq   %xmm1, %xmm0
>>> +; CHECK-NEXT:   pxor    %xmm1, %xmm1
>>> +; CHECK-NEXT:   psubq   %xmm0, %xmm1
>>> +; CHECK-NEXT:   movdqa  .LCPI{{[0-9_]*}}
>>> +; CHECK-NEXT:   movdqa  %xmm1, %xmm3
>>> +; CHECK-NEXT:   pxor    %xmm2, %xmm3
>>> +; CHECK-NEXT:   movdqa  %xmm2, %xmm4
>>> +; CHECK-NEXT:   pcmpgtd %xmm3, %xmm4
>>> +; CHECK-NEXT:   pshufd  $160, %xmm4, %xmm5      # xmm5 =
>> xmm4[0,0,2,2]
>>> +; CHECK-NEXT:   pcmpeqd %xmm2, %xmm3
>>> +; CHECK-NEXT:   pshufd  $245, %xmm3, %xmm2      # xmm2 =
>> xmm3[1,1,3,3]
>>> +; CHECK-NEXT:   pand    %xmm5, %xmm2
>>> +; CHECK-NEXT:   pshufd  $245, %xmm4, %xmm3      # xmm3 =
>> xmm4[1,1,3,3]
>>> +; CHECK-NEXT:   por     %xmm2, %xmm3
>>> +; CHECK-NEXT:   pand    %xmm3, %xmm0
>>> +; CHECK-NEXT:   pandn   %xmm1, %xmm3
>>> +; CHECK-NEXT:   por     %xmm3, %xmm0
>>> +; CHECK-NEXT:   retq
>>> +  %1 = call <2 x i64> @llvm.sabsdiff.v2i64(<2 x i64> %a1, <2 x i64>
>>> +%a2)
>>> +  ret <2 x i64> %1
>>> +}
>>> +
>>> +declare <16 x i32> @llvm.sabsdiff.v16i32(<16 x i32>, <16 x i32>)
>>> +
>>> +define <16 x i32> @test_sabsdiff_v16i32_expand(<16 x i32> %a1, <16 x
>>> +i32> %a2) { ; CHECK-LABEL: test_sabsdiff_v16i32_expand
>>> +; CHECK:             psubd  %xmm4, %xmm0
>>> +; CHECK-NEXT:        pxor    %xmm8, %xmm8
>>> +; CHECK-NEXT:        pxor    %xmm9, %xmm9
>>> +; CHECK-NEXT:        psubd   %xmm0, %xmm9
>>> +; CHECK-NEXT:        pxor    %xmm4, %xmm4
>>> +; CHECK-NEXT:        pcmpgtd %xmm9, %xmm4
>>> +; CHECK-NEXT:        pand    %xmm4, %xmm0
>>> +; CHECK-NEXT:        pandn   %xmm9, %xmm4
>>> +; CHECK-NEXT:        por     %xmm4, %xmm0
>>> +; CHECK-NEXT:        psubd   %xmm5, %xmm1
>>> +; CHECK-NEXT:        pxor    %xmm4, %xmm4
>>> +; CHECK-NEXT:        psubd   %xmm1, %xmm4
>>> +; CHECK-NEXT:        pxor    %xmm5, %xmm5
>>> +; CHECK-NEXT:        pcmpgtd %xmm4, %xmm5
>>> +; CHECK-NEXT:        pand    %xmm5, %xmm1
>>> +; CHECK-NEXT:        pandn   %xmm4, %xmm5
>>> +; CHECK-NEXT:        por     %xmm5, %xmm1
>>> +; CHECK-NEXT:        psubd   %xmm6, %xmm2
>>> +; CHECK-NEXT:        pxor    %xmm4, %xmm4
>>> +; CHECK-NEXT:        psubd   %xmm2, %xmm4
>>> +; CHECK-NEXT:        pxor    %xmm5, %xmm5
>>> +; CHECK-NEXT:        pcmpgtd %xmm4, %xmm5
>>> +; CHECK-NEXT:        pand    %xmm5, %xmm2
>>> +; CHECK-NEXT:        pandn   %xmm4, %xmm5
>>> +; CHECK-NEXT:        por     %xmm5, %xmm2
>>> +; CHECK-NEXT:        psubd   %xmm7, %xmm3
>>> +; CHECK-NEXT:        pxor    %xmm4, %xmm4
>>> +; CHECK-NEXT:        psubd   %xmm3, %xmm4
>>> +; CHECK-NEXT:        pcmpgtd %xmm4, %xmm8
>>> +; CHECK-NEXT:        pand    %xmm8, %xmm3
>>> +; CHECK-NEXT:        pandn   %xmm4, %xmm8
>>> +; CHECK-NEXT:        por     %xmm8, %xmm3
>>> +; CHECK-NEXT:        req
>> The tests look very fragile, should we make them more relaxed in terms of
>> register names?
> Do you mean to use regular expression for register names?
That’s one aspect. However, it will fail if scheduling would order the instructions in some other way. I think we could use CHECK-DAG directives here to overcome this issue (see http://llvm.org/docs/CommandGuide/FileCheck.html <http://llvm.org/docs/CommandGuide/FileCheck.html>). You could find similar examples in test/CodeGen/X86/avx2-shift.ll and other tests.

Thanks,
Michael
> 
>>> +  %1 = call <16 x i32> @llvm.sabsdiff.v16i32(<16 x i32> %a1, <16 x
>>> +i32> %a2)
>>> +  ret <16 x i32> %1
>>> +}
>>> +
>>> 
>>> 
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits <http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits>
>> 
>> 
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits <http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150729/1f181573/attachment.html>