[llvm] r242409 - [Codegen] Add intrinsics 'absdiff' and corresponding SDNodes for absolute difference operation
Michael Zolotukhin
mzolotukhin at apple.com
Wed Jul 29 16:48:33 PDT 2015
Hi Shahid,
Sorry for the delayed response, please see my answers inline:
> On Jul 22, 2015, at 1:54 AM, Shahid, Asghar-ahmad <Asghar-ahmad.Shahid at amd.com> wrote:
>
> Hi Mikhail,
>
> Thanks for the comments. Pls see the response inlined.
>
> Regards,
> Shahid
>
>> -----Original Message-----
>> From: llvm-commits-bounces at cs.uiuc.edu <mailto:llvm-commits-bounces at cs.uiuc.edu> [mailto:llvm-commits-
>> bounces at cs.uiuc.edu <mailto:bounces at cs.uiuc.edu>] On Behalf Of Mikhail Zolotukhin
>> Sent: Tuesday, July 21, 2015 12:02 AM
>> To: James Molloy
>> Cc: llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
>> Subject: Re: [llvm] r242409 - [Codegen] Add intrinsics 'absdiff' and
>> corresponding SDNodes for absolute difference operation
>>
>>
>>> On Jul 16, 2015, at 8:22 AM, James Molloy <James.Molloy at arm.com>
>> wrote:
>>>
>>> Author: jamesm
>>> Date: Thu Jul 16 10:22:46 2015
>>> New Revision: 242409
>>>
>>> URL: http://llvm.org/viewvc/llvm-project?rev=242409&view=rev
>>> Log:
>>> [Codegen] Add intrinsics 'absdiff' and corresponding SDNodes for
>>> absolute difference operation
>>>
>>> This adds new intrinsics "*absdiff" for absolute difference ops to facilitate
>> efficient code generation for "sum of absolute differences" operation.
>>> The patch also contains the introduction of corresponding SDNodes and
>> basic legalization support.Sanity of the generated code is tested on X86.
>>>
>>> This is 1st of the three patches.
>>>
>>> Patch by Shahid Asghar-ahmad!
>>>
>>> Added:
>>> llvm/trunk/test/CodeGen/X86/absdiff_expand.ll
>>> Modified:
>>> llvm/trunk/docs/LangRef.rst
>>> llvm/trunk/include/llvm/CodeGen/ISDOpcodes.h
>>> llvm/trunk/include/llvm/IR/Intrinsics.td
>>> llvm/trunk/include/llvm/Target/TargetSelectionDAG.td
>>> llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
>>> llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
>>> llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
>>> llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
>>> llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
>>> llvm/trunk/lib/CodeGen/TargetLoweringBase.cpp
>>>
>>> Modified: llvm/trunk/docs/LangRef.rst
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/LangRef.rst?rev=24
>>> 2409&r1=242408&r2=242409&view=diff
>>>
>> ==========================================================
>> ============
>>> ========
>>> --- llvm/trunk/docs/LangRef.rst (original)
>>> +++ llvm/trunk/docs/LangRef.rst Thu Jul 16 10:22:46 2015
>>> @@ -10328,6 +10328,65 @@ Examples:
>>>
>>> %r2 = call float @llvm.fmuladd.f32(float %a, float %b, float %c)
>>> ; yields float:r2 = (a * b) + c
>>>
>>> +
>>> +'``llvm.uabsdiff.*``' and '``llvm.sabsdiff.*``' Intrinsics
>>>
>> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> ^
>>> +
>>> +Syntax:
>>> +"""""""
>>> +This is an overloaded intrinsic. The loaded data is a vector of any integer
>> bit width.
>>> +
>>> +.. code-block:: llvm
>>> +
>>> + declare <4 x integer> @llvm.uabsdiff.v4i32(<4 x integer> %a, <4
>>> + x integer> %b)
>>> +
>>> +
>>> +Overview:
>>> +"""""""""
>>> +
>>> +The ``llvm.uabsdiff`` intrinsic returns a vector result of the
>>> +absolute difference of the two operands, treating them both as unsigned
>> integers.
>>> +
>>> +The ``llvm.sabsdiff`` intrinsic returns a vector result of the
>>> +absolute difference of the two operands, treating them both as signed
>> integers.
>>> +
>>> +.. note::
>>> +
>>> + These intrinsics are primarily used during the code generation stage of
>> compilation.
>>> + They are generated by compiler passes such as the Loop and SLP
>> vectorizers.it is not
>>> + recommended for users to create them manually.
>>> +
>>> +Arguments:
>>> +""""""""""
>>> +
>>> +Both intrinsics take two integer of the same bitwidth.
>>> +
>>> +Semantics:
>>> +""""""""""
>>> +
>>> +The expression::
>>> +
>>> + call <4 x i32> @llvm.uabsdiff.v4i32(<4 x i32> %a, <4 x i32> %b)
>>> +
>>> +is equivalent to::
>>> +
>>> + %sub = sub <4 x i32> %a, %b
>>> + %ispos = icmp ugt <4 x i32> %sub, <i32 -1, i32 -1, i32 -1, i32
>>> + -1>
>> Isn't it always 'false'?
> Oh yes, this will always be false as the comparison is 'unsigned'.
> Since the subtraction of two unsigned numbers can be a signed number, How about making this comparison as
> "icmp sge <4 x i32> %sub, zeroinitializer”
Sounds good to me.
>
>>> + %neg = sub <4 x i32> zeroinitializer, %sub
>>> + %1 = select <4 x i1> %ispos, <4 x i32> %sub, <4 x i32> %neg
>>> +
>>> +Similarly the expression::
>>> +
>>> + call <4 x i32> @llvm.sabsdiff.v4i32(<4 x i32> %a, <4 x i32> %b)
>>> +
>>> +is equivalent to::
>>> +
>>> + %sub = sub nsw <4 x i32> %a, %b
>>> + %ispos = icmp sgt <4 x i32> %sub, <i32 -1, i32 -1, i32 -1, i32
>>> + -1>
>> Wouldn't it be more readable if we use "icmp sge <4 x i32> %sub,
>> zeroinitializer"?
> Yes, that will do.
>
>>> + %neg = sub nsw <4 x i32> zeroinitializer, %sub
>>> + %1 = select <4 x i1> %ispos, <4 x i32> %sub, <4 x i32> %neg
>>> +
>>> +
>>> Half Precision Floating Point Intrinsics
>>> ----------------------------------------
>>>
>>>
>>> Modified: llvm/trunk/include/llvm/CodeGen/ISDOpcodes.h
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/CodeGen/IS
>>> DOpcodes.h?rev=242409&r1=242408&r2=242409&view=diff
>>>
>> ==========================================================
>> ============
>>> ========
>>> --- llvm/trunk/include/llvm/CodeGen/ISDOpcodes.h (original)
>>> +++ llvm/trunk/include/llvm/CodeGen/ISDOpcodes.h Thu Jul 16 10:22:46
>>> +++ 2015
>>> @@ -334,6 +334,10 @@ namespace ISD {
>>> /// Byte Swap and Counting operators.
>>> BSWAP, CTTZ, CTLZ, CTPOP,
>>>
>>> + /// [SU]ABSDIFF - Signed/Unsigned absolute difference of two input
>> integer
>>> + /// vector. These nodes are generated from llvm.*absdiff* intrinsics.
>>> + SABSDIFF, UABSDIFF,
>>> +
>>> /// Bit counting operators with an undefined result for zero inputs.
>>> CTTZ_ZERO_UNDEF, CTLZ_ZERO_UNDEF,
>>>
>>>
>>> Modified: llvm/trunk/include/llvm/IR/Intrinsics.td
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/IR/Intrins
>>> ics.td?rev=242409&r1=242408&r2=242409&view=diff
>>>
>> ==========================================================
>> ============
>>> ========
>>> --- llvm/trunk/include/llvm/IR/Intrinsics.td (original)
>>> +++ llvm/trunk/include/llvm/IR/Intrinsics.td Thu Jul 16 10:22:46 2015
>>> @@ -605,6 +605,12 @@ def int_convertuu : Intrinsic<[llvm_any def
>>> int_clear_cache : Intrinsic<[], [llvm_ptr_ty, llvm_ptr_ty],
>>> [], "llvm.clear_cache">;
>>>
>>> +// Calculate the Absolute Differences of the two input vectors.
>>> +def int_sabsdiff : Intrinsic<[llvm_anyvector_ty],
>>> + [ LLVMMatchType<0>, LLVMMatchType<0> ],
>>> +[IntrNoMem]>; def int_uabsdiff : Intrinsic<[llvm_anyvector_ty],
>>> + [ LLVMMatchType<0>, LLVMMatchType<0> ],
>>> +[IntrNoMem]>;
>>> +
>>> //===-------------------------- Masked Intrinsics
>>> -------------------------===// // def int_masked_store : Intrinsic<[],
>>> [llvm_anyvector_ty, LLVMPointerTo<0>,
>>>
>>> Modified: llvm/trunk/include/llvm/Target/TargetSelectionDAG.td
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Target/Tar
>>> getSelectionDAG.td?rev=242409&r1=242408&r2=242409&view=diff
>>>
>> ==========================================================
>> ============
>>> ========
>>> --- llvm/trunk/include/llvm/Target/TargetSelectionDAG.td (original)
>>> +++ llvm/trunk/include/llvm/Target/TargetSelectionDAG.td Thu Jul 16
>>> +++ 10:22:46 2015
>>> @@ -386,6 +386,8 @@ def smax : SDNode<"ISD::SMAX"
>>> def umin : SDNode<"ISD::UMIN" , SDTIntBinOp>;
>>> def umax : SDNode<"ISD::UMAX" , SDTIntBinOp>;
>>>
>>> +def sabsdiff : SDNode<"ISD::SABSDIFF" , SDTIntBinOp>;
>>> +def uabsdiff : SDNode<"ISD::UABSDIFF" , SDTIntBinOp>;
>>> def sext_inreg : SDNode<"ISD::SIGN_EXTEND_INREG", SDTExtInreg>;
>>> def bswap : SDNode<"ISD::BSWAP" , SDTIntUnaryOp>;
>>> def ctlz : SDNode<"ISD::CTLZ" , SDTIntUnaryOp>;
>>>
>>> Modified:
>> llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDA
>>>
>> G/LegalizeIntegerTypes.cpp?rev=242409&r1=242408&r2=242409&view=diff
>>>
>> ==========================================================
>> ============
>>> ========
>>> --- llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
>>> (original)
>>> +++ llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp Thu
>>> +++ Jul 16 10:22:46 2015
>>> @@ -146,6 +146,10 @@ void DAGTypeLegalizer::PromoteIntegerRes
>>> case ISD::ATOMIC_CMP_SWAP_WITH_SUCCESS:
>>> Res = PromoteIntRes_AtomicCmpSwap(cast<AtomicSDNode>(N),
>> ResNo);
>>> break;
>>> + case ISD::UABSDIFF:
>>> + case ISD::SABSDIFF:
>>> + Res = PromoteIntRes_SimpleIntBinOp(N);
>>> + break;
>>> }
>>>
>>> // If the result is null then the sub-method took care of registering it.
>>>
>>> Modified: llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDA
>>> G/LegalizeVectorOps.cpp?rev=242409&r1=242408&r2=242409&view=diff
>>>
>> ==========================================================
>> ============
>>> ========
>>> --- llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
>>> (original)
>>> +++ llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp Thu Jul
>>> +++ 16 10:22:46 2015
>>> @@ -105,6 +105,7 @@ class VectorLegalizer {
>>> SDValue ExpandLoad(SDValue Op);
>>> SDValue ExpandStore(SDValue Op);
>>> SDValue ExpandFNEG(SDValue Op);
>>> + SDValue ExpandABSDIFF(SDValue Op);
>>>
>>> /// \brief Implements vector promotion.
>>> ///
>>> @@ -326,6 +327,8 @@ SDValue VectorLegalizer::LegalizeOp(SDVa
>>> case ISD::SMAX:
>>> case ISD::UMIN:
>>> case ISD::UMAX:
>>> + case ISD::UABSDIFF:
>>> + case ISD::SABSDIFF:
>>> QueryType = Node->getValueType(0);
>>> break;
>>> case ISD::FP_ROUND_INREG:
>>> @@ -708,11 +711,36 @@ SDValue VectorLegalizer::Expand(SDValue
>>> return ExpandFNEG(Op);
>>> case ISD::SETCC:
>>> return UnrollVSETCC(Op);
>>> + case ISD::UABSDIFF:
>>> + case ISD::SABSDIFF:
>>> + return ExpandABSDIFF(Op);
>>> default:
>>> return DAG.UnrollVectorOp(Op.getNode());
>>> }
>>> }
>>>
>>> +SDValue VectorLegalizer::ExpandABSDIFF(SDValue Op) {
>>> + SDLoc dl(Op);
>>> + SDValue Tmp1, Tmp2, Tmp3, Tmp4;
>>> + EVT VT = Op.getValueType();
>>> + SDNodeFlags Flags;
>>> + Flags.setNoSignedWrap(Op->getOpcode() == ISD::SABSDIFF);
>>> +
>>> + Tmp2 = Op.getOperand(0);
>>> + Tmp3 = Op.getOperand(1);
>>> + Tmp1 = DAG.getNode(ISD::SUB, dl, VT, Tmp2, Tmp3, &Flags);
>>> + Tmp2 =
>>> + DAG.getNode(ISD::SUB, dl, VT, DAG.getConstant(0, dl, VT), Tmp1,
>>> +&Flags);
>>> + Tmp4 = DAG.getNode(
>>> + ISD::SETCC, dl,
>>> + TLI.getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(),
>> VT), Tmp2,
>>> + DAG.getConstant(0, dl, VT),
>>> + DAG.getCondCode(Op->getOpcode() == ISD::SABSDIFF ? ISD::SETLT
>>> + :
>>> +ISD::SETULT));
>>> + Tmp1 = DAG.getNode(ISD::VSELECT, dl, VT, Tmp4, Tmp1, Tmp2);
>>> + return Tmp1;
>>> +}
>>> +
>>> SDValue VectorLegalizer::ExpandSELECT(SDValue Op) {
>>> // Lower a select instruction where the condition is a scalar and the
>>> // operands are vectors. Lower this select to VSELECT and implement
>>> it
>>>
>>> Modified: llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDA
>>> G/LegalizeVectorTypes.cpp?rev=242409&r1=242408&r2=242409&view=diff
>>>
>> ==========================================================
>> ============
>>> ========
>>> --- llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
>>> (original)
>>> +++ llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp Thu
>>> +++ Jul 16 10:22:46 2015
>>> @@ -678,6 +678,8 @@ void DAGTypeLegalizer::SplitVectorResult
>>> case ISD::SMAX:
>>> case ISD::UMIN:
>>> case ISD::UMAX:
>>> + case ISD::UABSDIFF:
>>> + case ISD::SABSDIFF:
>>> SplitVecRes_BinOp(N, Lo, Hi);
>>> break;
>>> case ISD::FMA:
>>>
>>> Modified:
>> llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDA
>>> G/SelectionDAGBuilder.cpp?rev=242409&r1=242408&r2=242409&view=diff
>>>
>> ==========================================================
>> ============
>>> ========
>>> --- llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
>>> (original)
>>> +++ llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp Thu
>>> +++ Jul 16 10:22:46 2015
>>> @@ -4646,6 +4646,18 @@ SelectionDAGBuilder::visitIntrinsicCall(
>>> getValue(I.getArgOperand(0)).getValueType(),
>>> getValue(I.getArgOperand(0))));
>>> return nullptr;
>>> + case Intrinsic::uabsdiff:
>>> + setValue(&I, DAG.getNode(ISD::UABSDIFF, sdl,
>>> + getValue(I.getArgOperand(0)).getValueType(),
>>> + getValue(I.getArgOperand(0)),
>>> + getValue(I.getArgOperand(1))));
>>> + return nullptr;
>>> + case Intrinsic::sabsdiff:
>>> + setValue(&I, DAG.getNode(ISD::SABSDIFF, sdl,
>>> + getValue(I.getArgOperand(0)).getValueType(),
>>> + getValue(I.getArgOperand(0)),
>>> + getValue(I.getArgOperand(1))));
>>> + return nullptr;
>>> case Intrinsic::cttz: {
>>> SDValue Arg = getValue(I.getArgOperand(0));
>>> ConstantInt *CI = cast<ConstantInt>(I.getArgOperand(1));
>>>
>>> Modified:
>> llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDA
>>>
>> G/SelectionDAGDumper.cpp?rev=242409&r1=242408&r2=242409&view=diff
>>>
>> ==========================================================
>> ============
>>> ========
>>> --- llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
>>> (original)
>>> +++ llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp Thu
>> Jul
>>> +++ 16 10:22:46 2015
>>> @@ -225,6 +225,8 @@ std::string SDNode::getOperationName(con
>>> case ISD::SHL_PARTS: return "shl_parts";
>>> case ISD::SRA_PARTS: return "sra_parts";
>>> case ISD::SRL_PARTS: return "srl_parts";
>>> + case ISD::UABSDIFF: return "uabsdiff";
>>> + case ISD::SABSDIFF: return "sabsdiff";
>>>
>>> // Conversion operators.
>>> case ISD::SIGN_EXTEND: return "sign_extend";
>>>
>>> Modified: llvm/trunk/lib/CodeGen/TargetLoweringBase.cpp
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/TargetLower
>>> ingBase.cpp?rev=242409&r1=242408&r2=242409&view=diff
>>>
>> ==========================================================
>> ============
>>> ========
>>> --- llvm/trunk/lib/CodeGen/TargetLoweringBase.cpp (original)
>>> +++ llvm/trunk/lib/CodeGen/TargetLoweringBase.cpp Thu Jul 16 10:22:46
>>> +++ 2015
>>> @@ -827,6 +827,8 @@ void TargetLoweringBase::initActions() {
>>> setOperationAction(ISD::USUBO, VT, Expand);
>>> setOperationAction(ISD::SMULO, VT, Expand);
>>> setOperationAction(ISD::UMULO, VT, Expand);
>>> + setOperationAction(ISD::UABSDIFF, VT, Expand);
>>> + setOperationAction(ISD::SABSDIFF, VT, Expand);
>>>
>>> // These library functions default to expand.
>>> setOperationAction(ISD::FROUND, VT, Expand);
>>>
>>> Added: llvm/trunk/test/CodeGen/X86/absdiff_expand.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/absdif
>>> f_expand.ll?rev=242409&view=auto
>>>
>> ==========================================================
>> ============
>>> ========
>>> --- llvm/trunk/test/CodeGen/X86/absdiff_expand.ll (added)
>>> +++ llvm/trunk/test/CodeGen/X86/absdiff_expand.ll Thu Jul 16 10:22:46
>>> +++ 2015
>>> @@ -0,0 +1,242 @@
>>> +; RUN: llc -mtriple=x86_64-unknown-linux-gnu < %s | FileCheck %s
>>> +-check-prefix=CHECK
>>> +
>>> +declare <4 x i8> @llvm.uabsdiff.v4i8(<4 x i8>, <4 x i8>)
>>> +
>>> +define <4 x i8> @test_uabsdiff_v4i8_expand(<4 x i8> %a1, <4 x i8>
>>> +%a2) { ; CHECK-LABEL: test_uabsdiff_v4i8_expand
>>> +; CHECK: psubd %xmm1, %xmm0
>>> +; CHECK-NEXT: pxor %xmm1, %xmm1
>>> +; CHECK-NEXT: psubd %xmm0, %xmm1
>>> +; CHECK-NEXT: movdqa .LCPI{{[0-9_]*}}
>>> +; CHECK-NEXT: movdqa %xmm1, %xmm3
>>> +; CHECK-NEXT: pxor %xmm2, %xmm3
>>> +; CHECK-NEXT: pcmpgtd %xmm3, %xmm2
>>> +; CHECK-NEXT: pand %xmm2, %xmm0
>>> +; CHECK-NEXT: pandn %xmm1, %xmm2
>>> +; CHECK-NEXT: por %xmm2, %xmm0
>>> +; CHECK-NEXT: retq
>>> +
>>> + %1 = call <4 x i8> @llvm.uabsdiff.v4i8(<4 x i8> %a1, <4 x i8> %a2)
>>> + ret <4 x i8> %1
>>> +}
>>> +
>>> +declare <4 x i8> @llvm.sabsdiff.v4i8(<4 x i8>, <4 x i8>)
>>> +
>>> +define <4 x i8> @test_sabsdiff_v4i8_expand(<4 x i8> %a1, <4 x i8>
>>> +%a2) { ; CHECK-LABEL: test_sabsdiff_v4i8_expand
>>> +; CHECK: psubd %xmm1, %xmm0
>>> +; CHECK-NEXT: pxor %xmm1, %xmm1
>>> +; CHECK-NEXT: pxor %xmm2, %xmm2
>>> +; CHECK-NEXT: psubd %xmm0, %xmm2
>>> +; CHECK-NEXT: pcmpgtd %xmm2, %xmm1
>>> +; CHECK-NEXT: pand %xmm1, %xmm0
>>> +; CHECK-NEXT: pandn %xmm2, %xmm1
>>> +; CHECK-NEXT: por %xmm1, %xmm0
>>> +; CHECK-NEXT: retq
>>> +
>>> + %1 = call <4 x i8> @llvm.sabsdiff.v4i8(<4 x i8> %a1, <4 x i8> %a2)
>>> + ret <4 x i8> %1
>>> +}
>>> +
>>> +
>>> +declare <8 x i8> @llvm.sabsdiff.v8i8(<8 x i8>, <8 x i8>)
>>> +
>>> +define <8 x i8> @test_sabsdiff_v8i8_expand(<8 x i8> %a1, <8 x i8>
>>> +%a2) { ; CHECK-LABEL: test_sabsdiff_v8i8_expand
>>> +; CHECK: psubw %xmm1, %xmm0
>>> +; CHECK-NEXT: pxor %xmm1, %xmm1
>>> +; CHECK-NEXT: pxor %xmm2, %xmm2
>>> +; CHECK-NEXT: psubw %xmm0, %xmm2
>>> +; CHECK-NEXT: pcmpgtw %xmm2, %xmm1
>>> +; CHECK-NEXT: pand %xmm1, %xmm0
>>> +; CHECK-NEXT: pandn %xmm2, %xmm1
>>> +; CHECK-NEXT: por %xmm1, %xmm0
>>> +; CHECK-NEXT: retq
>>> + %1 = call <8 x i8> @llvm.sabsdiff.v8i8(<8 x i8> %a1, <8 x i8> %a2)
>>> + ret <8 x i8> %1
>>> +}
>>> +
>>> +declare <16 x i8> @llvm.uabsdiff.v16i8(<16 x i8>, <16 x i8>)
>>> +
>>> +define <16 x i8> @test_uabsdiff_v16i8_expand(<16 x i8> %a1, <16 x i8>
>>> +%a2) { ; CHECK-LABEL: test_uabsdiff_v16i8_expand
>>> +; CHECK: psubb %xmm1, %xmm0
>>> +; CHECK-NEXT: pxor %xmm1, %xmm1
>>> +; CHECK-NEXT: psubb %xmm0, %xmm1
>>> +; CHECK-NEXT: movdqa .LCPI{{[0-9_]*}}
>>> +; CHECK-NEXT: movdqa %xmm1, %xmm3
>>> +; CHECK-NEXT: pxor %xmm2, %xmm3
>>> +; CHECK-NEXT: pcmpgtb %xmm3, %xmm2
>>> +; CHECK-NEXT: pand %xmm2, %xmm0
>>> +; CHECK-NEXT: pandn %xmm1, %xmm2
>>> +; CHECK-NEXT: por %xmm2, %xmm0
>>> +; CHECK-NEXT: retq
>>> + %1 = call <16 x i8> @llvm.uabsdiff.v16i8(<16 x i8> %a1, <16 x i8>
>>> +%a2)
>>> + ret <16 x i8> %1
>>> +}
>>> +
>>> +declare <8 x i16> @llvm.uabsdiff.v8i16(<8 x i16>, <8 x i16>)
>>> +
>>> +define <8 x i16> @test_uabsdiff_v8i16_expand(<8 x i16> %a1, <8 x i16>
>>> +%a2) { ; CHECK-LABEL: test_uabsdiff_v8i16_expand
>>> +; CHECK: psubw %xmm1, %xmm0
>>> +; CHECK-NEXT: pxor %xmm1, %xmm1
>>> +; CHECK-NEXT: psubw %xmm0, %xmm1
>>> +; CHECK-NEXT: movdqa .LCPI{{[0-9_]*}}
>>> +; CHECK-NEXT: movdqa %xmm1, %xmm3
>>> +; CHECK-NEXT: pxor %xmm2, %xmm3
>>> +; CHECK-NEXT: pcmpgtw %xmm3, %xmm2
>>> +; CHECK-NEXT: pand %xmm2, %xmm0
>>> +; CHECK-NEXT: pandn %xmm1, %xmm2
>>> +; CHECK-NEXT: por %xmm2, %xmm0
>>> +; CHECK-NEXT: retq
>>> + %1 = call <8 x i16> @llvm.uabsdiff.v8i16(<8 x i16> %a1, <8 x i16>
>>> +%a2)
>>> + ret <8 x i16> %1
>>> +}
>>> +
>>> +declare <8 x i16> @llvm.sabsdiff.v8i16(<8 x i16>, <8 x i16>)
>>> +
>>> +define <8 x i16> @test_sabsdiff_v8i16_expand(<8 x i16> %a1, <8 x i16>
>>> +%a2) { ; CHECK-LABEL: test_sabsdiff_v8i16_expand
>>> +; CHECK: psubw %xmm1, %xmm0
>>> +; CHECK-NEXT: pxor %xmm1, %xmm1
>>> +; CHECK-NEXT: pxor %xmm2, %xmm2
>>> +; CHECK-NEXT: psubw %xmm0, %xmm2
>>> +; CHECK-NEXT: pcmpgtw %xmm2, %xmm1
>>> +; CHECK-NEXT: pand %xmm1, %xmm0
>>> +; CHECK-NEXT: pandn %xmm2, %xmm1
>>> +; CHECK-NEXT: por %xmm1, %xmm0
>>> +; CHECK-NEXT: retq
>>> + %1 = call <8 x i16> @llvm.sabsdiff.v8i16(<8 x i16> %a1, <8 x i16>
>>> +%a2)
>>> + ret <8 x i16> %1
>>> +}
>>> +
>>> +declare <4 x i32> @llvm.sabsdiff.v4i32(<4 x i32>, <4 x i32>)
>>> +
>>> +define <4 x i32> @test_sabsdiff_v4i32_expand(<4 x i32> %a1, <4 x i32>
>>> +%a2) { ; CHECK-LABEL: test_sabsdiff_v4i32_expand
>>> +; CHECK: psubd %xmm1, %xmm0
>>> +; CHECK-NEXT: pxor %xmm1, %xmm1
>>> +; CHECK-NEXT: pxor %xmm2, %xmm2
>>> +; CHECK-NEXT: psubd %xmm0, %xmm2
>>> +; CHECK-NEXT: pcmpgtd %xmm2, %xmm1
>>> +; CHECK-NEXT: pand %xmm1, %xmm0
>>> +; CHECK-NEXT: pandn %xmm2, %xmm1
>>> +; CHECK-NEXT: por %xmm1, %xmm0
>>> +; CHECK-NEXT: retq
>>> + %1 = call <4 x i32> @llvm.sabsdiff.v4i32(<4 x i32> %a1, <4 x i32>
>>> +%a2)
>>> + ret <4 x i32> %1
>>> +}
>>> +
>>> +declare <4 x i32> @llvm.uabsdiff.v4i32(<4 x i32>, <4 x i32>)
>>> +
>>> +define <4 x i32> @test_uabsdiff_v4i32_expand(<4 x i32> %a1, <4 x i32>
>>> +%a2) { ; CHECK-LABEL: test_uabsdiff_v4i32_expand
>>> +; CHECK: psubd %xmm1, %xmm0
>>> +; CHECK-NEXT: pxor %xmm1, %xmm1
>>> +; CHECK-NEXT: psubd %xmm0, %xmm1
>>> +; CHECK-NEXT: movdqa .LCPI{{[0-9_]*}}
>>> +; CHECK-NEXT: movdqa %xmm1, %xmm3
>>> +; CHECK-NEXT: pxor %xmm2, %xmm3
>>> +; CHECK-NEXT: pcmpgtd %xmm3, %xmm2
>>> +; CHECK-NEXT: pand %xmm2, %xmm0
>>> +; CHECK-NEXT: pandn %xmm1, %xmm2
>>> +; CHECK-NEXT: por %xmm2, %xmm0
>>> +; CHECK-NEXT: retq
>>> + %1 = call <4 x i32> @llvm.uabsdiff.v4i32(<4 x i32> %a1, <4 x i32>
>>> +%a2)
>>> + ret <4 x i32> %1
>>> +}
>>> +
>>> +declare <2 x i32> @llvm.sabsdiff.v2i32(<2 x i32>, <2 x i32>)
>>> +
>>> +define <2 x i32> @test_sabsdiff_v2i32_expand(<2 x i32> %a1, <2 x i32>
>>> +%a2) { ; CHECK-LABEL: test_sabsdiff_v2i32_expand
>>> +; CHECK: psubq %xmm1, %xmm0
>>> +; CHECK-NEXT: pxor %xmm1, %xmm1
>>> +; CHECK-NEXT: psubq %xmm0, %xmm1
>>> +; CHECK-NEXT: movdqa .LCPI{{[0-9_]*}}
>>> +; CHECK-NEXT: movdqa %xmm1, %xmm3
>>> +; CHECK-NEXT: pxor %xmm2, %xmm3
>>> +; CHECK-NEXT: movdqa %xmm2, %xmm4
>>> +; CHECK-NEXT: pcmpgtd %xmm3, %xmm4
>>> +; CHECK-NEXT: pshufd $160, %xmm4, %xmm5 # xmm5 =
>> xmm4[0,0,2,2]
>>> +; CHECK-NEXT: pcmpeqd %xmm2, %xmm3
>>> +; CHECK-NEXT: pshufd $245, %xmm3, %xmm2 # xmm2 =
>> xmm3[1,1,3,3]
>>> +; CHECK-NEXT: pand %xmm5, %xmm2
>>> +; CHECK-NEXT: pshufd $245, %xmm4, %xmm3 # xmm3 =
>> xmm4[1,1,3,3]
>>> +; CHECK-NEXT: por %xmm2, %xmm3
>>> +; CHECK-NEXT: pand %xmm3, %xmm0
>>> +; CHECK-NEXT: pandn %xmm1, %xmm3
>>> +; CHECK-NEXT: por %xmm3, %xmm0
>>> +; CHECK-NEXT: retq
>>> + %1 = call <2 x i32> @llvm.sabsdiff.v2i32(<2 x i32> %a1, <2 x i32>
>>> +%a2)
>>> + ret <2 x i32> %1
>>> +}
>>> +
>>> +declare <2 x i64> @llvm.sabsdiff.v2i64(<2 x i64>, <2 x i64>)
>>> +
>>> +define <2 x i64> @test_sabsdiff_v2i64_expand(<2 x i64> %a1, <2 x i64>
>>> +%a2) { ; CHECK-LABEL: test_sabsdiff_v2i64_expand
>>> +; CHECK: psubq %xmm1, %xmm0
>>> +; CHECK-NEXT: pxor %xmm1, %xmm1
>>> +; CHECK-NEXT: psubq %xmm0, %xmm1
>>> +; CHECK-NEXT: movdqa .LCPI{{[0-9_]*}}
>>> +; CHECK-NEXT: movdqa %xmm1, %xmm3
>>> +; CHECK-NEXT: pxor %xmm2, %xmm3
>>> +; CHECK-NEXT: movdqa %xmm2, %xmm4
>>> +; CHECK-NEXT: pcmpgtd %xmm3, %xmm4
>>> +; CHECK-NEXT: pshufd $160, %xmm4, %xmm5 # xmm5 =
>> xmm4[0,0,2,2]
>>> +; CHECK-NEXT: pcmpeqd %xmm2, %xmm3
>>> +; CHECK-NEXT: pshufd $245, %xmm3, %xmm2 # xmm2 =
>> xmm3[1,1,3,3]
>>> +; CHECK-NEXT: pand %xmm5, %xmm2
>>> +; CHECK-NEXT: pshufd $245, %xmm4, %xmm3 # xmm3 =
>> xmm4[1,1,3,3]
>>> +; CHECK-NEXT: por %xmm2, %xmm3
>>> +; CHECK-NEXT: pand %xmm3, %xmm0
>>> +; CHECK-NEXT: pandn %xmm1, %xmm3
>>> +; CHECK-NEXT: por %xmm3, %xmm0
>>> +; CHECK-NEXT: retq
>>> + %1 = call <2 x i64> @llvm.sabsdiff.v2i64(<2 x i64> %a1, <2 x i64>
>>> +%a2)
>>> + ret <2 x i64> %1
>>> +}
>>> +
>>> +declare <16 x i32> @llvm.sabsdiff.v16i32(<16 x i32>, <16 x i32>)
>>> +
>>> +define <16 x i32> @test_sabsdiff_v16i32_expand(<16 x i32> %a1, <16 x
>>> +i32> %a2) { ; CHECK-LABEL: test_sabsdiff_v16i32_expand
>>> +; CHECK: psubd %xmm4, %xmm0
>>> +; CHECK-NEXT: pxor %xmm8, %xmm8
>>> +; CHECK-NEXT: pxor %xmm9, %xmm9
>>> +; CHECK-NEXT: psubd %xmm0, %xmm9
>>> +; CHECK-NEXT: pxor %xmm4, %xmm4
>>> +; CHECK-NEXT: pcmpgtd %xmm9, %xmm4
>>> +; CHECK-NEXT: pand %xmm4, %xmm0
>>> +; CHECK-NEXT: pandn %xmm9, %xmm4
>>> +; CHECK-NEXT: por %xmm4, %xmm0
>>> +; CHECK-NEXT: psubd %xmm5, %xmm1
>>> +; CHECK-NEXT: pxor %xmm4, %xmm4
>>> +; CHECK-NEXT: psubd %xmm1, %xmm4
>>> +; CHECK-NEXT: pxor %xmm5, %xmm5
>>> +; CHECK-NEXT: pcmpgtd %xmm4, %xmm5
>>> +; CHECK-NEXT: pand %xmm5, %xmm1
>>> +; CHECK-NEXT: pandn %xmm4, %xmm5
>>> +; CHECK-NEXT: por %xmm5, %xmm1
>>> +; CHECK-NEXT: psubd %xmm6, %xmm2
>>> +; CHECK-NEXT: pxor %xmm4, %xmm4
>>> +; CHECK-NEXT: psubd %xmm2, %xmm4
>>> +; CHECK-NEXT: pxor %xmm5, %xmm5
>>> +; CHECK-NEXT: pcmpgtd %xmm4, %xmm5
>>> +; CHECK-NEXT: pand %xmm5, %xmm2
>>> +; CHECK-NEXT: pandn %xmm4, %xmm5
>>> +; CHECK-NEXT: por %xmm5, %xmm2
>>> +; CHECK-NEXT: psubd %xmm7, %xmm3
>>> +; CHECK-NEXT: pxor %xmm4, %xmm4
>>> +; CHECK-NEXT: psubd %xmm3, %xmm4
>>> +; CHECK-NEXT: pcmpgtd %xmm4, %xmm8
>>> +; CHECK-NEXT: pand %xmm8, %xmm3
>>> +; CHECK-NEXT: pandn %xmm4, %xmm8
>>> +; CHECK-NEXT: por %xmm8, %xmm3
>>> +; CHECK-NEXT: req
>> The tests look very fragile, should we make them more relaxed in terms of
>> register names?
> Do you mean to use regular expression for register names?
That’s one aspect. However, it will fail if scheduling would order the instructions in some other way. I think we could use CHECK-DAG directives here to overcome this issue (see http://llvm.org/docs/CommandGuide/FileCheck.html <http://llvm.org/docs/CommandGuide/FileCheck.html>). You could find similar examples in test/CodeGen/X86/avx2-shift.ll and other tests.
Thanks,
Michael
>
>>> + %1 = call <16 x i32> @llvm.sabsdiff.v16i32(<16 x i32> %a1, <16 x
>>> +i32> %a2)
>>> + ret <16 x i32> %1
>>> +}
>>> +
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits <http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits>
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits <http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150729/1f181573/attachment.html>
More information about the llvm-commits
mailing list