[llvm] r188265 - AVX-512: Added CMP and BLEND instructions.

Demikhovsky, Elena elena.demikhovsky at intel.com
Tue Aug 27 00:42:15 PDT 2013


Hi,

In AVX-512 we have CMP and BLEND instructions for v16f32, v16i32, v8i64, v8f64. All other types will require sign/zero extensions and truncations.
All these conversions will not look good for v16i16. I agree that this function should be changed, but only for the types mentioned above.

-  Elena


-----Original Message-----
From: Nadav Rotem [mailto:nrotem at apple.com] 
Sent: Monday, August 26, 2013 21:42
To: Juergen Ributzka
Cc: Demikhovsky, Elena; LLVM Commits
Subject: Re: [llvm] r188265 - AVX-512: Added CMP and BLEND instructions.

Juergen, 

This is a very good point. I think that if AVX512 is available then we would always want to use the mask-register.  Elena, do you agree, or are there situations where we would still prefer to use the XMM masks ?

Thanks,
Nadav


On Aug 26, 2013, at 11:26 AM, Juergen Ributzka <juergen at apple.com> wrote:

> Hi Elena,
> 
> X86TargetLowering::getSetCCResultType still uses the old way (changeVectorElementTypeToInteger) of converting the vector operand type to a vector integer of the same size. I guess that will have to change for AVX-512 too?
> 
> Cheers,
> Juergen
> 
> 
> On Aug 22, 2013, at 3:26 PM, Juergen Ributzka <juergen at apple.com> wrote:
> 
>> Oh, my bad. I had a wrong understanding of vector masks. Thanks for 
>> clarifying :-)
>> 
>> -Juergen
>> 
>> On Aug 22, 2013, at 11:59 AM, Demikhovsky, Elena <elena.demikhovsky at intel.com> wrote:
>> 
>>> In AVX-512 types v16i1 and v8i1 are legal, since the architecture has mask registers.
>>> So, after type legalizer I still have these masks.
>>> 
>>> -  Elena
>>> 
>>> 
>>> -----Original Message-----
>>> From: Juergen Ributzka [mailto:juergen at apple.com]
>>> Sent: Thursday, August 22, 2013 21:37
>>> To: Demikhovsky, Elena
>>> Cc: LLVM Commits
>>> Subject: Re: [llvm] r188265 - AVX-512: Added CMP and BLEND instructions.
>>> 
>>> Hi Elena,
>>> 
>>> I see that you use an element type of i1 for vselect masks. Normally masks for vselect match the result type in number of vector lanes and element bitwidth.
>>> 
>>> Was that change intentional?
>>> 
>>> Thanks
>>> 
>>> Cheers,
>>> Juergen
>>> 
>>> 
>>> On Aug 13, 2013, at 6:24 AM, Elena Demikhovsky <elena.demikhovsky at intel.com> wrote:
>>> 
>>>> Author: delena
>>>> Date: Tue Aug 13 08:24:07 2013
>>>> New Revision: 188265
>>>> 
>>>> URL: http://llvm.org/viewvc/llvm-project?rev=188265&view=rev
>>>> Log:
>>>> AVX-512: Added CMP and BLEND instructions.
>>>> Lowering for SETCC.
>>>> 
>>>> Added:
>>>> llvm/trunk/test/CodeGen/X86/avx512-vec-cmp.ll
>>>> Modified:
>>>> llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
>>>> llvm/trunk/lib/Target/X86/X86ISelLowering.h
>>>> llvm/trunk/lib/Target/X86/X86InstrAVX512.td
>>>> llvm/trunk/lib/Target/X86/X86InstrFragmentsSIMD.td
>>>> 
>>>> Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
>>>> URL: 
>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86IS
>>>> elL owering.cpp?rev=188265&r1=188264&r2=188265&view=diff
>>>> ===================================================================
>>>> ===
>>>> ========
>>>> --- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)
>>>> +++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Tue Aug 13 
>>>> +++ 08:24:07
>>>> +++ 2013
>>>> @@ -9705,6 +9705,42 @@ static SDValue Lower256IntVSETCC(SDValue
>>>>                   DAG.getNode(Op.getOpcode(), dl, NewVT, LHS2, 
>>>> RHS2, CC)); }
>>>> 
>>>> +static SDValue LowerIntVSETCC_AVX512(SDValue Op, SelectionDAG 
>>>> +&DAG) {
>>>> +  SDValue Cond;
>>>> +  SDValue Op0 = Op.getOperand(0);
>>>> +  SDValue Op1 = Op.getOperand(1);
>>>> +  SDValue CC = Op.getOperand(2);
>>>> +  MVT VT = Op.getValueType().getSimpleVT();
>>>> +
>>>> +  EVT OpVT = Op0.getValueType();
>>>> +  assert(OpVT.getVectorElementType().getSizeInBits() >= 32 &&
>>>> +         Op.getValueType().getScalarType() == MVT::i1 &&
>>>> +        "Cannot set masked compare for this operation");
>>>> +
>>>> +  ISD::CondCode SetCCOpcode = cast<CondCodeSDNode>(CC)->get();  
>>>> + SDLoc dl(Op);
>>>> +
>>>> +  bool Unsigned = false;
>>>> +  unsigned SSECC;
>>>> +  switch (SetCCOpcode) {
>>>> +  default: llvm_unreachable("Unexpected SETCC condition");  case
>>>> + ISD::SETNE:  SSECC = 4; break;  case ISD::SETEQ:  SSECC = 0; 
>>>> + break; case ISD::SETUGT: Unsigned = true;  case ISD::SETGT:  
>>>> + SSECC = 6; break; // NLE  case ISD::SETULT: Unsigned = true;  case ISD::SETLT:
>>>> + SSECC = 1; break;  case ISD::SETUGE: Unsigned = true;  case
>>>> + ISD::SETGE:  SSECC = 5; break; // NLT  case ISD::SETULE: Unsigned 
>>>> + = true;  case ISD::SETLE:  SSECC = 2; break;  }  unsigned  Opc = 
>>>> + Unsigned ? X86ISD::CMPMU: X86ISD::CMPM;  return DAG.getNode(Opc, 
>>>> + dl, VT, Op0, Op1,
>>>> +                     DAG.getConstant(SSECC, MVT::i8));
>>>> +
>>>> +}
>>>> +
>>>> static SDValue LowerVSETCC(SDValue Op, const X86Subtarget *Subtarget,
>>>>                         SelectionDAG &DAG) { SDValue Cond; @@ 
>>>> -9723,7 +9759,12 @@ static SDValue LowerVSETCC(SDValue Op, c #endif
>>>> 
>>>>  unsigned SSECC = translateX86FSETCC(SetCCOpcode, Op0, Op1);
>>>> -
>>>> +    unsigned  Opc = X86ISD::CMPP;
>>>> +    unsigned NumElems = VT.getVectorNumElements();
>>>> +    if (Subtarget->hasAVX512() && VT.getVectorElementType() == MVT::i1) {
>>>> +      assert(NumElems <=16);
>>>> +      Opc = X86ISD::CMPM;
>>>> +    }
>>>>  // In the two special cases we can't handle, emit two comparisons.
>>>>  if (SSECC == 8) {
>>>>    unsigned CC0, CC1;
>>>> @@ -9735,14 +9776,14 @@ static SDValue LowerVSETCC(SDValue Op, c
>>>>      CC0 = 7; CC1 = 4; CombineOpc = ISD::AND;
>>>>    }
>>>> 
>>>> -      SDValue Cmp0 = DAG.getNode(X86ISD::CMPP, dl, VT, Op0, Op1,
>>>> +      SDValue Cmp0 = DAG.getNode(Opc, dl, VT, Op0, Op1,
>>>>                               DAG.getConstant(CC0, MVT::i8));
>>>> -      SDValue Cmp1 = DAG.getNode(X86ISD::CMPP, dl, VT, Op0, Op1,
>>>> +      SDValue Cmp1 = DAG.getNode(Opc, dl, VT, Op0, Op1,
>>>>                               DAG.getConstant(CC1, MVT::i8));
>>>>    return DAG.getNode(CombineOpc, dl, VT, Cmp0, Cmp1);  }  // 
>>>> Handle all other FP comparisons here.
>>>> -    return DAG.getNode(X86ISD::CMPP, dl, VT, Op0, Op1,
>>>> +    return DAG.getNode(Opc, dl, VT, Op0, Op1,
>>>>                     DAG.getConstant(SSECC, MVT::i8)); }
>>>> 
>>>> @@ -9750,6 +9791,24 @@ static SDValue LowerVSETCC(SDValue Op, c if 
>>>> (VT.is256BitVector() && !Subtarget->hasInt256())  return 
>>>> Lower256IntVSETCC(Op, DAG);
>>>> 
>>>> +  bool MaskResult = (VT.getVectorElementType() == MVT::i1);  EVT 
>>>> + OpVT = Op1.getValueType();  if (Subtarget->hasAVX512()) {
>>>> +    if (Op1.getValueType().is512BitVector() ||
>>>> +        (MaskResult && OpVT.getVectorElementType().getSizeInBits() >= 32))
>>>> +      return LowerIntVSETCC_AVX512(Op, DAG);
>>>> +
>>>> +    // In AVX-512 architecture setcc returns mask with i1 elements,
>>>> +    // But there is no compare instruction for i8 and i16 elements.
>>>> +    // We are not talking about 512-bit operands in this case, these
>>>> +    // types are illegal.
>>>> +    if (MaskResult &&
>>>> +        (OpVT.getVectorElementType().getSizeInBits() < 32 &&
>>>> +         OpVT.getVectorElementType().getSizeInBits() >= 8))
>>>> +      return DAG.getNode(ISD::TRUNCATE, dl, VT,
>>>> +                         DAG.getNode(ISD::SETCC, dl, OpVT, Op0, 
>>>> + Op1, CC));  }
>>>> +
>>>> // We are handling one of the integer comparisons here.  Since SSE 
>>>> only has // GT and EQ comparisons for integer, swapping operands 
>>>> and multiple // operations may be required for some comparisons.
>>>> @@ -9759,15 +9818,18 @@ static SDValue LowerVSETCC(SDValue Op, c 
>>>> switch (SetCCOpcode) {
>>>> default: llvm_unreachable("Unexpected SETCC condition"); case 
>>>> ISD::SETNE:  Invert = true;
>>>> -  case ISD::SETEQ:  Opc = X86ISD::PCMPEQ; break;
>>>> +  case ISD::SETEQ:  Opc = MaskResult? X86ISD::PCMPEQM: 
>>>> + X86ISD::PCMPEQ; break;
>>>> case ISD::SETLT:  Swap = true;
>>>> -  case ISD::SETGT:  Opc = X86ISD::PCMPGT; break;
>>>> +  case ISD::SETGT:  Opc = MaskResult? X86ISD::PCMPGTM: 
>>>> + X86ISD::PCMPGT; break;
>>>> case ISD::SETGE:  Swap = true;
>>>> -  case ISD::SETLE:  Opc = X86ISD::PCMPGT; Invert = true; break;
>>>> +  case ISD::SETLE:  Opc = MaskResult? X86ISD::PCMPGTM: X86ISD::PCMPGT;
>>>> +                    Invert = true; break;
>>>> case ISD::SETULT: Swap = true;
>>>> -  case ISD::SETUGT: Opc = X86ISD::PCMPGT; FlipSigns = true; break;
>>>> +  case ISD::SETUGT: Opc = MaskResult? X86ISD::PCMPGTM: X86ISD::PCMPGT;
>>>> +                    FlipSigns = true; break;
>>>> case ISD::SETUGE: Swap = true;
>>>> -  case ISD::SETULE: Opc = X86ISD::PCMPGT; FlipSigns = true; Invert 
>>>> = true; break;
>>>> +  case ISD::SETULE: Opc = MaskResult? X86ISD::PCMPGTM: X86ISD::PCMPGT;
>>>> +                    FlipSigns = true; Invert = true; break;
>>>> }
>>>> 
>>>> // Special case: Use min/max operations for SETULE/SETUGE @@
>>>> -13201,6 +13263,8 @@ const char *X86TargetLowering::getTarget
>>>> case X86ISD::CMP:                return "X86ISD::CMP";
>>>> case X86ISD::COMI:               return "X86ISD::COMI";
>>>> case X86ISD::UCOMI:              return "X86ISD::UCOMI";
>>>> +  case X86ISD::CMPM:               return "X86ISD::CMPM";
>>>> +  case X86ISD::CMPMU:              return "X86ISD::CMPMU";
>>>> case X86ISD::SETCC:              return "X86ISD::SETCC";
>>>> case X86ISD::SETCC_CARRY:        return "X86ISD::SETCC_CARRY";
>>>> case X86ISD::FSETCCsd:           return "X86ISD::FSETCCsd";
>>>> @@ -13273,6 +13337,8 @@ const char *X86TargetLowering::getTarget
>>>> case X86ISD::CMPP:               return "X86ISD::CMPP";
>>>> case X86ISD::PCMPEQ:             return "X86ISD::PCMPEQ";
>>>> case X86ISD::PCMPGT:             return "X86ISD::PCMPGT";
>>>> +  case X86ISD::PCMPEQM:            return "X86ISD::PCMPEQM";
>>>> +  case X86ISD::PCMPGTM:            return "X86ISD::PCMPGTM";
>>>> case X86ISD::ADD:                return "X86ISD::ADD";
>>>> case X86ISD::SUB:                return "X86ISD::SUB";
>>>> case X86ISD::ADC:                return "X86ISD::ADC";
>>>> 
>>>> Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.h
>>>> URL: 
>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86IS
>>>> elL owering.h?rev=188265&r1=188264&r2=188265&view=diff
>>>> ===================================================================
>>>> ===
>>>> ========
>>>> --- llvm/trunk/lib/Target/X86/X86ISelLowering.h (original)
>>>> +++ llvm/trunk/lib/Target/X86/X86ISelLowering.h Tue Aug 13 08:24:07
>>>> +++ 2013
>>>> @@ -274,6 +274,13 @@ namespace llvm {
>>>> 
>>>>    // PCMP* - Vector integer comparisons.
>>>>    PCMPEQ, PCMPGT,
>>>> +      // PCMP*M - Vector integer comparisons, the result is in a mask vector
>>>> +      PCMPEQM, PCMPGTM,
>>>> +
>>>> +      /// CMPM, CMPMU - Vector comparison generating mask bits for fp and
>>>> +      /// integer signed and unsigned data types.
>>>> +      CMPM,
>>>> +      CMPMU,
>>>> 
>>>>    // ADD, SUB, SMUL, etc. - Arithmetic operations with FLAGS results.
>>>>    ADD, SUB, ADC, SBB, SMUL,
>>>> 
>>>> Modified: llvm/trunk/lib/Target/X86/X86InstrAVX512.td
>>>> URL: 
>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86In
>>>> str AVX512.td?rev=188265&r1=188264&r2=188265&view=diff
>>>> ===================================================================
>>>> ===
>>>> ========
>>>> --- llvm/trunk/lib/Target/X86/X86InstrAVX512.td (original)
>>>> +++ llvm/trunk/lib/Target/X86/X86InstrAVX512.td Tue Aug 13 08:24:07
>>>> +++ 2013
>>>> @@ -564,7 +564,195 @@ defm VPERMI2PS : avx512_perm_3src<0x77, defm 
>>>> VPERMI2PD : avx512_perm_3src<0x77, "vpermi2pd",  VR512, memopv8f64, i512mem,
>>>>                             v8f64>, EVEX_V512, VEX_W, EVEX_CD8<64,
>>>> CD8VF>;
>>>> 
>>>> +//===-------------------------------------------------------------
>>>> +---
>>>> +------===//
>>>> +// AVX-512 - BLEND using mask
>>>> +//
>>>> +multiclass avx512_blendmask<bits<8> opc, string OpcodeStr,
>>>> +                          RegisterClass KRC, RegisterClass RC,
>>>> +                          X86MemOperand x86memop, PatFrag mem_frag,
>>>> +                          SDNode OpNode, ValueType vt> {
>>>> +  def rr : AVX5128I<opc, MRMSrcReg, (outs RC:$dst),
>>>> +               (ins KRC:$mask, RC:$src1, RC:$src2),
>>>> +               !strconcat(OpcodeStr,
>>>> +                "\t{$src2, $src1, ${dst}{${mask}}|${dst}{${mask}}, $src1, $src2}"),
>>>> +               [(set RC:$dst, (OpNode KRC:$mask, (vt RC:$src2), 
>>>> +                 (vt RC:$src1)))]>, EVEX_4V, EVEX_K;
>>>> +
>>>> +  def rm : AVX5128I<opc, MRMSrcMem, (outs RC:$dst),
>>>> +               (ins KRC:$mask, RC:$src1, x86memop:$src2),
>>>> +               !strconcat(OpcodeStr,
>>>> +                "\t{$src2, $src1, $mask, $dst|$dst, $mask, $src1, $src2}"),
>>>> +               []>, 
>>>> +               EVEX_4V, EVEX_K;
>>>> +}
>>>> +
>>>> +let ExeDomain = SSEPackedSingle in defm VBLENDMPSZ : 
>>>> +avx512_blendmask<0x65, "vblendmps", VK16WM, VR512, f512mem,
>>>> +                              memopv16f32, vselect, v16f32>, 
>>>> +                              EVEX_CD8<32, CD8VF>, EVEX_V512; let 
>>>> +ExeDomain = SSEPackedDouble in defm VBLENDMPDZ :
>>>> +avx512_blendmask<0x65, "vblendmpd", VK8WM, VR512, f512mem,
>>>> +                              memopv8f64, vselect, v8f64>, 
>>>> +                              VEX_W, EVEX_CD8<64, CD8VF>, 
>>>> +EVEX_V512;
>>>> +
>>>> +defm VPBLENDMDZ : avx512_blendmask<0x64, "vpblendmd", VK16WM, VR512, f512mem, 
>>>> +                                memopv8i64, vselect, v16i32>, 
>>>> +                                EVEX_CD8<32, CD8VF>, EVEX_V512;
>>>> +
>>>> +defm VPBLENDMQZ : avx512_blendmask<0x64, "vpblendmq", VK8WM, VR512, f512mem, 
>>>> +                                memopv8i64, vselect, v8i64>, VEX_W, 
>>>> +                                EVEX_CD8<64, CD8VF>, EVEX_V512;
>>>> 
>>>> +
>>>> +let Predicates = [HasAVX512] in {
>>>> +def : Pat<(v8f32 (vselect (v8i1 VK8WM:$mask), (v8f32 VR256X:$src1),
>>>> +                            (v8f32 VR256X:$src2))),
>>>> +            (EXTRACT_SUBREG 
>>>> +              (v16f32 (VBLENDMPSZrr (COPY_TO_REGCLASS VK8WM:$mask, VK16WM), 
>>>> +            (v16f32 (SUBREG_TO_REG (i32 0), VR256X:$src2, sub_ymm)),
>>>> +            (v16f32 (SUBREG_TO_REG (i32 0), VR256X:$src1, 
>>>> +sub_ymm)))), sub_ymm)>;
>>>> +
>>>> +def : Pat<(v8i32 (vselect (v8i1 VK8WM:$mask), (v8i32 VR256X:$src1),
>>>> +                            (v8i32 VR256X:$src2))),
>>>> +            (EXTRACT_SUBREG 
>>>> +                (v16i32 (VPBLENDMDZrr (COPY_TO_REGCLASS VK8WM:$mask, VK16WM), 
>>>> +            (v16i32 (SUBREG_TO_REG (i32 0), VR256X:$src2, sub_ymm)),
>>>> +            (v16i32 (SUBREG_TO_REG (i32 0), VR256X:$src1, 
>>>> +sub_ymm)))), sub_ymm)>; }
>>>> +
>>>> +multiclass avx512_icmp_packed<bits<8> opc, string OpcodeStr, RegisterClass KRC, 
>>>> +              RegisterClass RC, X86MemOperand x86memop, PatFrag memop_frag, 
>>>> +              SDNode OpNode, ValueType vt> {
>>>> +  def rr : AVX512BI<opc, MRMSrcReg,
>>>> +             (outs KRC:$dst), (ins RC:$src1, RC:$src2), 
>>>> +             !strconcat(OpcodeStr, "\t{$src2, $src1, $dst|$dst, $src1, $src2}"),
>>>> +             [(set KRC:$dst, (OpNode (vt RC:$src1), (vt RC:$src2)))], 
>>>> +             IIC_SSE_CMPP_RR>, EVEX_4V;
>>>> +  def rm : AVX512BI<opc, MRMSrcMem,
>>>> +             (outs KRC:$dst), (ins RC:$src1, x86memop:$src2), 
>>>> +             !strconcat(OpcodeStr, "\t{$src2, $src1, $dst|$dst, $src1, $src2}"),
>>>> +             [(set KRC:$dst, (OpNode (vt RC:$src1), 
>>>> +              (bitconvert (memop_frag addr:$src2))))],
>>>> +             IIC_SSE_CMPP_RM>, EVEX_4V; }
>>>> +
>>>> +defm VPCMPEQDZ : avx512_icmp_packed<0x76, "vpcmpeqd", VK16, VR512, i512mem, 
>>>> +                           memopv8i64, X86pcmpeqm, v16i32>, 
>>>> +EVEX_V512; defm VPCMPEQQZ : avx512_icmp_packed<0x29, "vpcmpeqq", VK8, VR512, i512mem,
>>>> +                           memopv8i64, X86pcmpeqm, v8i64>, T8, 
>>>> +EVEX_V512, VEX_W;
>>>> +
>>>> +defm VPCMPGTDZ : avx512_icmp_packed<0x66, "vpcmpgtd", VK16, VR512, i512mem, 
>>>> +                           memopv8i64, X86pcmpgtm, v16i32>, 
>>>> +EVEX_V512; defm VPCMPGTQZ : avx512_icmp_packed<0x37, "vpcmpgtq", VK8, VR512, i512mem,
>>>> +                           memopv8i64, X86pcmpgtm, v8i64>, T8, 
>>>> +EVEX_V512, VEX_W;
>>>> +
>>>> +def : Pat<(v8i1 (X86pcmpgtm (v8i32 VR256X:$src1), (v8i32 VR256X:$src2))),
>>>> +            (COPY_TO_REGCLASS (VPCMPGTDZrr 
>>>> +            (v16i32 (SUBREG_TO_REG (i32 0), VR256X:$src1, sub_ymm)),
>>>> +            (v16i32 (SUBREG_TO_REG (i32 0), VR256X:$src2, 
>>>> +sub_ymm))), VK8)>;
>>>> +
>>>> +def : Pat<(v8i1 (X86pcmpeqm (v8i32 VR256X:$src1), (v8i32 VR256X:$src2))),
>>>> +            (COPY_TO_REGCLASS (VPCMPEQDZrr 
>>>> +            (v16i32 (SUBREG_TO_REG (i32 0), VR256X:$src1, sub_ymm)),
>>>> +            (v16i32 (SUBREG_TO_REG (i32 0), VR256X:$src2, 
>>>> +sub_ymm))), VK8)>;
>>>> +
>>>> +
>>>> +multiclass avx512_icmp_cc<bits<8> opc, RegisterClass KRC,
>>>> +              RegisterClass RC, X86MemOperand x86memop, PatFrag memop_frag, 
>>>> +              SDNode OpNode, ValueType vt, Operand CC, string asm,
>>>> +              string asm_alt> {
>>>> +  def rri : AVX512AIi8<opc, MRMSrcReg,
>>>> +             (outs KRC:$dst), (ins RC:$src1, RC:$src2, CC:$cc), asm,
>>>> +             [(set KRC:$dst, (OpNode (vt RC:$src1), (vt RC:$src2), imm:$cc))], 
>>>> +             IIC_SSE_CMPP_RR>, EVEX_4V;
>>>> +  def rmi : AVX512AIi8<opc, MRMSrcMem,
>>>> +             (outs KRC:$dst), (ins RC:$src1, x86memop:$src2, CC:$cc), asm,
>>>> +             [(set KRC:$dst, (OpNode (vt RC:$src1), 
>>>> +              (bitconvert (memop_frag addr:$src2)), imm:$cc))],
>>>> +             IIC_SSE_CMPP_RM>, EVEX_4V;
>>>> +  // Accept explicit immediate argument form instead of comparison code.
>>>> +  let neverHasSideEffects = 1 in {
>>>> +    def rri_alt : AVX512AIi8<opc, MRMSrcReg,
>>>> +               (outs RC:$dst), (ins RC:$src1, RC:$src2, i8imm:$cc),
>>>> +               asm_alt, [], IIC_SSE_CMPP_RR>, EVEX_4V;
>>>> +    def rmi_alt : AVX512AIi8<opc, MRMSrcMem,
>>>> +               (outs RC:$dst), (ins RC:$src1, x86memop:$src2, i8imm:$cc),
>>>> +               asm_alt, [], IIC_SSE_CMPP_RM>, EVEX_4V;
>>>> +  }
>>>> +}
>>>> +
>>>> +defm VPCMPDZ : avx512_icmp_cc<0x1F, VK16, VR512, i512mem, memopv8i64,
>>>> +                              X86cmpm, v16i32, AVXCC,
>>>> +              "vpcmp${cc}d\t{$src2, $src1, $dst|$dst, $src1, $src2}",
>>>> +              "vpcmpd\t{$cc, $src2, $src1, $dst|$dst, $src1, $src2, $cc}">,
>>>> +              EVEX_V512, EVEX_CD8<32, CD8VF>; defm VPCMPUDZ : 
>>>> +avx512_icmp_cc<0x1E, VK16, VR512, i512mem, memopv8i64,
>>>> +                               X86cmpmu, v16i32, AVXCC,
>>>> +              "vpcmp${cc}ud\t{$src2, $src1, $dst|$dst, $src1, $src2}",
>>>> +              "vpcmpud\t{$cc, $src2, $src1, $dst|$dst, $src1, $src2, $cc}">,
>>>> +              EVEX_V512, EVEX_CD8<32, CD8VF>;
>>>> +
>>>> +defm VPCMPQZ : avx512_icmp_cc<0x1F, VK8, VR512, i512mem, memopv8i64,
>>>> +                              X86cmpm, v8i64, AVXCC,
>>>> +              "vpcmp${cc}q\t{$src2, $src1, $dst|$dst, $src1, $src2}",
>>>> +              "vpcmpq\t{$cc, $src2, $src1, $dst|$dst, $src1, $src2, $cc}">,
>>>> +              VEX_W, EVEX_V512, EVEX_CD8<64, CD8VF>; defm VPCMPUQZ : 
>>>> +avx512_icmp_cc<0x1E, VK8, VR512, i512mem, memopv8i64,
>>>> +                               X86cmpmu, v8i64, AVXCC,
>>>> +              "vpcmp${cc}uq\t{$src2, $src1, $dst|$dst, $src1, $src2}",
>>>> +              "vpcmpuq\t{$cc, $src2, $src1, $dst|$dst, $src1, $src2, $cc}">,
>>>> +              VEX_W, EVEX_V512, EVEX_CD8<64, CD8VF>;
>>>> +
>>>> +// avx512_cmp_packed - sse 1 & 2 compare packed instructions 
>>>> +multiclass avx512_cmp_packed<RegisterClass KRC, RegisterClass RC,
>>>> +                           X86MemOperand x86memop, Operand CC,
>>>> +                           SDNode OpNode, ValueType vt, string asm,
>>>> +                           string asm_alt, Domain d> {
>>>> +  def rri : AVX512PIi8<0xC2, MRMSrcReg,
>>>> +             (outs KRC:$dst), (ins RC:$src1, RC:$src2, CC:$cc), asm,
>>>> +             [(set KRC:$dst, (OpNode (vt RC:$src1), (vt RC:$src2), 
>>>> +imm:$cc))], d>;
>>>> +  def rmi : AVX512PIi8<0xC2, MRMSrcMem,
>>>> +             (outs KRC:$dst), (ins RC:$src1, x86memop:$src2, CC:$cc), asm,
>>>> +             [(set KRC:$dst,
>>>> +              (OpNode (vt RC:$src1), (memop addr:$src2), 
>>>> +imm:$cc))],
>>>> +d>;
>>>> +
>>>> +  // Accept explicit immediate argument form instead of comparison code.
>>>> +  let neverHasSideEffects = 1 in {
>>>> +    def rri_alt : PIi8<0xC2, MRMSrcReg,
>>>> +               (outs RC:$dst), (ins RC:$src1, RC:$src2, i8imm:$cc),
>>>> +               asm_alt, [], IIC_SSE_CMPP_RR, d>;
>>>> +    def rmi_alt : PIi8<0xC2, MRMSrcMem,
>>>> +               (outs RC:$dst), (ins RC:$src1, x86memop:$src2, i8imm:$cc),
>>>> +               asm_alt, [], IIC_SSE_CMPP_RM, d>;
>>>> +  }
>>>> +}
>>>> +
>>>> +defm VCMPPSZ : avx512_cmp_packed<VK16, VR512, f512mem, AVXCC, X86cmpm, v16f32,
>>>> +               "vcmp${cc}ps\t{$src2, $src1, $dst|$dst, $src1, $src2}",
>>>> +               "vcmpps\t{$cc, $src2, $src1, $dst|$dst, $src1, $src2, $cc}",
>>>> +               SSEPackedSingle>, TB, EVEX_4V, EVEX_V512, 
>>>> +EVEX_CD8<32,
>>>> +CD8VF>; defm VCMPPDZ : avx512_cmp_packed<VK8, VR512, f512mem, 
>>>> +CD8VF>AVXCC, X86cmpm, v8f64,
>>>> +               "vcmp${cc}pd\t{$src2, $src1, $dst|$dst, $src1, $src2}",
>>>> +               "vcmppd\t{$cc, $src2, $src1, $dst|$dst, $src1, $src2, $cc}",
>>>> +               SSEPackedDouble>, TB, OpSize, EVEX_4V, VEX_W, EVEX_V512,
>>>> +               EVEX_CD8<64, CD8VF>;
>>>> +
>>>> +def : Pat<(v8i1 (X86cmpm (v8f32 VR256X:$src1), (v8f32 VR256X:$src2), imm:$cc)),
>>>> +          (COPY_TO_REGCLASS (VCMPPSZrri
>>>> +            (v16f32 (SUBREG_TO_REG (i32 0), VR256X:$src1, sub_ymm)),
>>>> +            (v16f32 (SUBREG_TO_REG (i32 0), VR256X:$src2, sub_ymm)),
>>>> +            imm:$cc), VK8)>;
>>>> +def : Pat<(v8i1 (X86cmpm (v8i32 VR256X:$src1), (v8i32 VR256X:$src2), imm:$cc)),
>>>> +          (COPY_TO_REGCLASS (VPCMPDZrri
>>>> +            (v16i32 (SUBREG_TO_REG (i32 0), VR256X:$src1, sub_ymm)),
>>>> +            (v16i32 (SUBREG_TO_REG (i32 0), VR256X:$src2, sub_ymm)),
>>>> +            imm:$cc), VK8)>;
>>>> +def : Pat<(v8i1 (X86cmpmu (v8i32 VR256X:$src1), (v8i32 VR256X:$src2), imm:$cc)),
>>>> +          (COPY_TO_REGCLASS (VPCMPUDZrri
>>>> +            (v16i32 (SUBREG_TO_REG (i32 0), VR256X:$src1, sub_ymm)),
>>>> +            (v16i32 (SUBREG_TO_REG (i32 0), VR256X:$src2, sub_ymm)),
>>>> +            imm:$cc), VK8)>;
>>>> +               
>>>> // Mask register copy, including
>>>> // - copy between mask registers
>>>> // - load/store mask registers
>>>> @@ -949,4 +1137,18 @@ defm VMOVDQU32 : avx512_mov_int<0x6F, "v defm
>>>> VMOVDQU64 : avx512_mov_int<0x6F, "vmovdqu64", VR512, VK8WM, memopv8i64, i512mem>,
>>>>                              EVEX_V512, VEX_W, EVEX_CD8<64, CD8VF>;
>>>> 
>>>> +let AddedComplexity = 20 in {
>>>> +def : Pat<(v16f32 (vselect VK16WM:$mask, (v16f32 VR512:$src1), 
>>>> +                           (v16f32 VR512:$src2))),
>>>> +                  (VMOVUPSZrrk VR512:$src2, VK16WM:$mask, 
>>>> +VR512:$src1)>; def : Pat<(v8f64 (vselect VK8WM:$mask, (v8f64 VR512:$src1),
>>>> +                           (v8f64 VR512:$src2))),
>>>> +                  (VMOVUPDZrrk VR512:$src2, VK8WM:$mask, 
>>>> +VR512:$src1)>; def : Pat<(v16i32 (vselect VK16WM:$mask, (v16i32 VR512:$src1),
>>>> +                           (v16i32 VR512:$src2))),
>>>> +                  (VMOVDQU32rrk VR512:$src2, VK16WM:$mask, 
>>>> +VR512:$src1)>; def : Pat<(v8i64 (vselect VK8WM:$mask, (v8i64 VR512:$src1),
>>>> +                           (v8i64 VR512:$src2))),
>>>> +                  (VMOVDQU64rrk VR512:$src2, VK8WM:$mask, 
>>>> +VR512:$src1)>; }
>>>> 
>>>> 
>>>> Modified: llvm/trunk/lib/Target/X86/X86InstrFragmentsSIMD.td
>>>> URL: 
>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86In
>>>> str FragmentsSIMD.td?rev=188265&r1=188264&r2=188265&view=diff
>>>> ===================================================================
>>>> ===
>>>> ========
>>>> --- llvm/trunk/lib/Target/X86/X86InstrFragmentsSIMD.td (original)
>>>> +++ llvm/trunk/lib/Target/X86/X86InstrFragmentsSIMD.td Tue Aug 13
>>>> +++ 08:24:07 2013
>>>> @@ -118,6 +118,15 @@ def X86cmpp    : SDNode<"X86ISD::CMPP",
>>>> def X86pcmpeq  : SDNode<"X86ISD::PCMPEQ", SDTIntBinOp, 
>>>> [SDNPCommutative]>; def X86pcmpgt  : SDNode<"X86ISD::PCMPGT",
>>>> SDTIntBinOp>;
>>>> 
>>>> +def X86IntCmpMask : SDTypeProfile<1, 2,
>>>> +    [SDTCisVec<0>, SDTCisSameAs<1, 2>, SDTCisInt<1>]>; def 
>>>> +X86pcmpeqm
>>>> +: SDNode<"X86ISD::PCMPEQM", X86IntCmpMask, [SDNPCommutative]>; def 
>>>> +X86pcmpgtm  : SDNode<"X86ISD::PCMPGTM", X86IntCmpMask>;
>>>> +
>>>> +def X86CmpMaskCC : SDTypeProfile<1, 3, [SDTCisVec<0>, SDTCisSameAs<1, 2>, SDTCisVT<3, i8>]>;
>>>> +def X86cmpm   : SDNode<"X86ISD::CMPM",    X86CmpMaskCC>;
>>>> +def X86cmpmu  : SDNode<"X86ISD::CMPMU",   X86CmpMaskCC>;
>>>> +
>>>> def X86vshl    : SDNode<"X86ISD::VSHL",
>>>>                      SDTypeProfile<1, 2, [SDTCisVec<0>, SDTCisSameAs<0,1>,
>>>>                                    SDTCisVec<2>]>>;
>>>> 
>>>> Added: llvm/trunk/test/CodeGen/X86/avx512-vec-cmp.ll
>>>> URL: 
>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx
>>>> 512
>>>> -vec-cmp.ll?rev=188265&view=auto
>>>> ===================================================================
>>>> ===
>>>> ========
>>>> --- llvm/trunk/test/CodeGen/X86/avx512-vec-cmp.ll (added)
>>>> +++ llvm/trunk/test/CodeGen/X86/avx512-vec-cmp.ll Tue Aug 13 
>>>> +++ 08:24:07
>>>> +++ 2013
>>>> @@ -0,0 +1,113 @@
>>>> +; RUN: llc < %s -mtriple=x86_64-apple-darwin -mcpu=knl | FileCheck 
>>>> +%s
>>>> +
>>>> +; CHECK-LABEL: test1
>>>> +; CHECK: vcmpleps
>>>> +; CHECK: vmovups
>>>> +; CHECK: ret
>>>> +define <16 x float> @test1(<16 x float> %x, <16 x float> %y) nounwind {
>>>> +	%mask = fcmp ole <16 x float> %x, %y
>>>> +	%max = select <16 x i1> %mask, <16 x float> %x, <16 x float> %y
>>>> +	ret <16 x float> %max
>>>> +}
>>>> +
>>>> +; CHECK-LABEL: test2
>>>> +; CHECK: vcmplepd
>>>> +; CHECK: vmovupd
>>>> +; CHECK: ret
>>>> +define <8 x double> @test2(<8 x double> %x, <8 x double> %y) nounwind {
>>>> +	%mask = fcmp ole <8 x double> %x, %y
>>>> +	%max = select <8 x i1> %mask, <8 x double> %x, <8 x double> %y
>>>> +	ret <8 x double> %max
>>>> +}
>>>> +
>>>> +; CHECK-LABEL: test3
>>>> +; CHECK: vpcmpeqd
>>>> +; CHECK: vmovdqu32
>>>> +; CHECK: ret
>>>> +define <16 x i32> @test3(<16 x i32> %x, <16 x i32> %y) nounwind {
>>>> +	%mask = icmp eq <16 x i32> %x, %y
>>>> +	%max = select <16 x i1> %mask, <16 x i32> %x, <16 x i32> %y
>>>> +	ret <16 x i32> %max
>>>> +}
>>>> +
>>>> +; CHECK-LABEL: @test4_unsigned
>>>> +; CHECK: vpcmpnltud
>>>> +; CHECK: vmovdqu32
>>>> +; CHECK: ret
>>>> +define <16 x i32> @test4_unsigned(<16 x i32> %x, <16 x i32> %y) nounwind {
>>>> +	%mask = icmp uge <16 x i32> %x, %y
>>>> +	%max = select <16 x i1> %mask, <16 x i32> %x, <16 x i32> %y
>>>> +	ret <16 x i32> %max
>>>> +}
>>>> +
>>>> +; CHECK-LABEL: test5
>>>> +; CHECK: vpcmpeqq {{.*}}%k1
>>>> +; CHECK: vmovdqu64 {{.*}}%k1
>>>> +; CHECK: ret
>>>> +define <8 x i64> @test5(<8 x i64> %x, <8 x i64> %y) nounwind {
>>>> +	%mask = icmp eq <8 x i64> %x, %y
>>>> +	%max = select <8 x i1> %mask, <8 x i64> %x, <8 x i64> %y
>>>> +	ret <8 x i64> %max
>>>> +}
>>>> +
>>>> +; CHECK-LABEL: test6_unsigned
>>>> +; CHECK: vpcmpnleuq {{.*}}%k1
>>>> +; CHECK: vmovdqu64 {{.*}}%k1
>>>> +; CHECK: ret
>>>> +define <8 x i64> @test6_unsigned(<8 x i64> %x, <8 x i64> %y) nounwind {
>>>> +	%mask = icmp ugt <8 x i64> %x, %y
>>>> +	%max = select <8 x i1> %mask, <8 x i64> %x, <8 x i64> %y
>>>> +	ret <8 x i64> %max
>>>> +}
>>>> +
>>>> +; CHECK-LABEL: test7
>>>> +; CHECK: xor
>>>> +; CHECK: vcmpltps
>>>> +; CHECK: vblendvps
>>>> +; CHECK: ret
>>>> +define <4 x float> @test7(<4 x float> %a, <4 x float> %b) {
>>>> +  %mask = fcmp olt <4 x float> %a, zeroinitializer
>>>> +  %c = select <4 x i1>%mask, <4 x float>%a, <4 x float>%b
>>>> +  ret <4 x float>%c
>>>> +}
>>>> +
>>>> +; CHECK-LABEL: test8
>>>> +; CHECK: xor
>>>> +; CHECK: vcmpltpd
>>>> +; CHECK: vblendvpd
>>>> +; CHECK: ret
>>>> +define <2 x double> @test8(<2 x double> %a, <2 x double> %b) {
>>>> +  %mask = fcmp olt <2 x double> %a, zeroinitializer
>>>> +  %c = select <2 x i1>%mask, <2 x double>%a, <2 x double>%b
>>>> +  ret <2 x double>%c
>>>> +}
>>>> +
>>>> +; CHECK-LABEL: test9
>>>> +; CHECK: vpcmpeqd
>>>> +; CHECK: vpblendmd
>>>> +; CHECK: ret
>>>> +define <8 x i32> @test9(<8 x i32> %x, <8 x i32> %y) nounwind {
>>>> +  %mask = icmp eq <8 x i32> %x, %y
>>>> +  %max = select <8 x i1> %mask, <8 x i32> %x, <8 x i32> %y
>>>> +  ret <8 x i32> %max
>>>> +}
>>>> +
>>>> +; CHECK-LABEL: test10
>>>> +; CHECK: vcmpeqps
>>>> +; CHECK: vblendmps
>>>> +; CHECK: ret
>>>> +define <8 x float> @test10(<8 x float> %x, <8 x float> %y) 
>>>> +nounwind {
>>>> +  %mask = fcmp oeq <8 x float> %x, %y
>>>> +  %max = select <8 x i1> %mask, <8 x float> %x, <8 x float> %y
>>>> +  ret <8 x float> %max
>>>> +}
>>>> +
>>>> +; CHECK-LABEL: test11_unsigned
>>>> +; CHECK: vpcmpnleud %zmm
>>>> +; CHECK: vpblendmd  %zmm
>>>> +; CHECK: ret
>>>> +define <8 x i32> @test11_unsigned(<8 x i32> %x, <8 x i32> %y) 
>>>> +nounwind {
>>>> +  %mask = icmp ugt <8 x i32> %x, %y
>>>> +  %max = select <8 x i1> %mask, <8 x i32> %x, <8 x i32> %y
>>>> +  ret <8 x i32> %max
>>>> +}
>>>> 
>>>> 
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>> 
>>> --------------------------------------------------------------------
>>> -
>>> Intel Israel (74) Limited
>>> 
>>> This e-mail and any attachments may contain confidential material 
>>> for the sole use of the intended recipient(s). Any review or 
>>> distribution by others is strictly prohibited. If you are not the 
>>> intended recipient, please contact the sender and delete all copies.
>>> 
>> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.





More information about the llvm-commits mailing list