[PATCH][AVX512] Enable intrinsics for vpcmpeq

Tue Sep 30 02:17:43 PDT 2014

Thank you for review, Adam! I will add a comment.

Yes, actually we can convert VSELECT to AND on CPP, not in TD.
There is a case inside function PerformSELECTCombine, where VSELECT
w/selector produced by SETCC considered for convertion to AND or OR nodes.
I can teach this case to allow transform after our intrinsic lowering
(Actually, I tried change primitively and it worked). I will prepare
separate patch for this soon.

2014-09-30 11:46 GMT+04:00 Demikhovsky, Elena <elena.demikhovsky at intel.com>:

>  I agree with Adam. Can we convert VSELECT to AND in td?
>
>
>
> -          * Elena*
>
>
>
> *From:* Adam Nemet [mailto:anemet at apple.com]
> *Sent:* Monday, September 29, 2014 20:27
> *To:* Robert Khasanov
> *Cc:* Demikhovsky, Elena; llvm-commits at cs.uiuc.edu
> *Subject:* Re: [PATCH][AVX512] Enable intrinsics for vpcmpeq
>
>
>
>
>
> On Sep 29, 2014, at 7:22 AM, Robert Khasanov <rob.khasanov at gmail.com>
> wrote:
>
>
>
>  Hi Adam,
>
>
>
> 2014-09-28 11:14 GMT+04:00 Adam Nemet <anemet at apple.com>:
>
>
>
> On Sep 26, 2014, at 8:04 AM, Robert Khasanov <rob.khasanov at gmail.com>
> wrote:
>
>
>
>  Hi Adam,
>
>
>
> Thanks for review.
>
> I agree with you comment about reusing getVectorMaskingNode. See new
> patches attached.
>
>
>
> Is 003 the correct version now?
>
>
>
>
>
> yes.
>
>   However, I don't understand your comment about ISD::INSERT_SUBVECTOR.
> Could you please clarify your suggestion?
>
>
>
> You’re making this op legal for v8i1, so we will now generate things like:
>
>
>
>     (v8i1 (insert_subvector (v8i1 foo), (v4i1 bar), (iPTR 0)))
>
>
>
> Do we have instructions to match these in the TD file?
>
>
>
>
>
> We generate things like (v8i1 (insert_subvector undef, (v4i1 src), (iPTR
> 0))).
>
> For this we have following rules in TD (see 003 patch):
>
>
>
> +  def : Pat<(v8i1 (insert_subvector undef, (v4i1 VK4:$src), (iPTR 0))),
>
> +            (v8i1 (COPY_TO_REGCLASS VK4:$src, VK8))>;
>
> +  def : Pat<(v8i1 (insert_subvector undef, (v2i1 VK2:$src), (iPTR 0))),
>
> +            (v8i1 (COPY_TO_REGCLASS VK2:$src, VK8))>;
>
>
>
> OK, I missed this one.
>
>
>
> You still owe me :) the sample DAG in the comment to:
>
>
>
>  +      SDValue Mask = Op.getOperand(3);
>
> +      EVT BitcastVT = EVT::getVectorVT(*DAG.getContext(), MVT::i1,
>
> +
> Mask.getValueType().getSizeInBits());
>
>        SDValue Cmp = DAG.getNode(IntrData->Opc0, dl, MaskVT,
>
>                                  Op.getOperand(1), Op.getOperand(2));
>
> -      SDValue Res = getVectorMaskingNode(Cmp, Op.getOperand(3),
>
> -                                         DAG.getTargetConstant(0,
> MaskVT), DAG);
>
> +      SDValue CmpMask = getVectorMaskingNode(Cmp, Op.getOperand(3),
>
> +                                        DAG.getTargetConstant(0, MaskVT),
> DAG);
>
> +      SDValue Res = DAG.getNode(ISD::INSERT_SUBVECTOR, dl, BitcastVT,
>
> +                                DAG.getUNDEF(BitcastVT), CmpMask,
>
> +                                DAG.getIntPtrConstant(0));
>
>
>
> Other than that LGTM, thanks for splitting it up like this.
>
>
>
> One more question more of as a follow-on discussion.  Do we actually have
> to generate AND for masking compares?  I understand that that is what they
> get canonicalized into but perhaps we can still generate vselect uniformly.
>
>
>
> Adam
>
>
>
>
>
>
>
> Thanks,
>
> Robert
>
>
>
>  Adam
>
>
>
>   Thanks,
>
> Robert
>
>
>
> 2014-09-26 4:19 GMT+04:00 Adam Nemet <anemet at apple.com>:
>
>
>
> On Sep 25, 2014, at 12:43 AM, Robert Khasanov <rob.khasanov at gmail.com>
> wrote:
>
>
>
>   The last patch is incorrect. See this version instead of previous one.
>
> Sorry for that.
>
>
>
> 2014-09-25 11:40 GMT+04:00 Robert Khasanov <rob.khasanov at gmail.com>:
>
> Hi Elena, Adam,
>
>
>
> In these patches I enable intrinsics for vpcmpeq{bwdq} instructions.
>
> Since result of the instructions is mask, I enable new intrinsics type
> CMP_MASK and assume to enable CMP_MASK_CC in future for vpcmp{bwdq}
> instructions (due to additional CC argument).
>
> Also I extended argument types when intrinsics generated through TableGen
> (IIT_V64) to support 64 packed data.
>
> In last patch I enabled INSERT_SUBVECTOR for v8i1 to legalizer to support
> converting v2i1 and v4i1 to v8i1 and then bitcasting to i8.
>
>
>
> Please let me know if it looks good.
>
>
>
> Hi Robert,
>
>
>
> +    case CMP_MASK: {
>
> +      EVT VT = Op.getOperand(1).getValueType();
>
> +      EVT MaskVT = EVT::getVectorVT(*DAG.getContext(), MVT::i1,
>
> +                                    VT.getVectorNumElements());
>
> +      SDValue Cmp = DAG.getNode(IntrData->Opc0, dl, MaskVT,
>
> +                                Op.getOperand(1), Op.getOperand(2));
>
> +      SDValue Res;
>
> +      if (isAllOnes(Op.getOperand(3))) {
>
> +        Res = Cmp;
>
> +      } else {
>
> +        Res = DAG.getNode(ISD::AND, dl, MaskVT, Cmp,
>
> +                          DAG.getNode(ISD::BITCAST, dl, MaskVT,
>
> +                                      Op.getOperand(3)));
>
> +      }
>
> +      return DAG.getNode(ISD::BITCAST, dl, Op.getValueType(), Res);
>
> +    }
>
>
>
> It would be good to reuse (and extend if needed) the helper
> getVectorMaskingNode for this.  That should capture what it takes from a
> expression to become a masked expression in one place (just like
> AVX512_masking in td).  Obviously the BITCAST at the end does not belong
> there there but the information that masked compare is canonicalized to an
> AND rather than a VSELECT is useful.
>
>
>
> As you see I also don’t check for the mask value to be AllOnes in
> getVectorMaskingNode.  We can add that for now if you need but I think that
> should be an orthogonal DAGCombiner transformation to cover more cases.
>
>
>
> +    setOperationAction(ISD::INSERT_SUBVECTOR,   MVT::v8i1, Legal);
>
>
>
> Can you just enable it like that?  I understand that you need it to lower
> the intrinsics but you may need to add some more patterns to TD to
> recognize this in the general case.
>
>
>
> -        Res = DAG.getNode(ISD::AND, dl, MaskVT, Cmp,
>
> -                          DAG.getNode(ISD::BITCAST, dl, MaskVT,
>
> -                                      Op.getOperand(3)));
>
> +        RMask = DAG.getNode(ISD::AND, dl, MaskVT, Cmp,
>
> +                            DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl,
> MaskVT,
>
> +                              DAG.getNode(ISD::BITCAST, dl, BitcastVT,
> Mask),
>
> +                              DAG.getIntPtrConstant(0)));
>
>        }
>
> -      return DAG.getNode(ISD::BITCAST, dl, Op.getValueType(), Res);
>
> +      return DAG.getNode(ISD::BITCAST, dl, Op.getValueType(),
>
> +                         DAG.getNode(ISD::INSERT_SUBVECTOR, dl, BitcastVT,
>
> +                                     DAG.getUNDEF(BitcastVT), RMask,
>
> +                                     DAG.getIntPtrConstant(0)));
>
>
>
> This could use a comment with an example DAG that we’re trying to create.
>
>
>
> Thanks,
>
> Adam
>
>
>
>
>
>
>
> Robert
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140930/184fa2f7/attachment.html>