[PATCH] D11218: AVX512 : Integer Truncate with/without saturation support

Wed Jul 15 07:49:57 PDT 2015

ab added a subscriber: ab.

================
Comment at: include/llvm/IR/IntrinsicsX86.td:5294
@@ +5293,3 @@
+let TargetPrefix = "x86" in {
+  def int_x86_avx512_mask_pmov_qb_128 :
+          GCCBuiltin<"__builtin_ia32_pmovqb128_mask">,
----------------
So, this file is starting to get pretty unwieldy.  Why not use multiclasses?

================
Comment at: include/llvm/Target/TargetSelectionDAG.td:494
@@ -493,3 +493,3 @@
 
-def masked_store : SDNode<"ISD::MSTORE",  SDTMaskedStore,
+def mst : SDNode<"ISD::MSTORE",  SDTMaskedStore,
                        [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
----------------
I'm not a fan of this intermediate state.  Can you add the same comment that's above ld/st ("Don't use this directly..."), and/or move this somewhere else?

Basically I'd prefer masked_load and masked_store to be adjacent.

================
Comment at: include/llvm/Target/TargetSelectionDAG.td:495
@@ -495,2 +494,3 @@
+def mst : SDNode<"ISD::MSTORE",  SDTMaskedStore,
                        [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
 def masked_load  : SDNode<"ISD::MLOAD",  SDTMaskedLoad,
----------------
Indentation?

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:1371-1372
@@ +1370,4 @@
+      setTruncStoreAction(MVT::v32i16,  MVT::v32i8, Legal);
+      setTruncStoreAction(MVT::v16i16,  MVT::v16i8, Legal);
+      setTruncStoreAction(MVT::v8i16,   MVT::v8i8,  Legal);
+
----------------
>From my understanding, BWI doesn't imply VLX (even though SKX has both).  So, shouldn't these be predicated by hasVLX() as well?  I mention the same issue in another comment, so I might be missing something.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:12490
@@ +12489,3 @@
+
+  // vpmovqb/w/d, vpmovdb/w, vpmovqwb
+  if ((Subtarget->hasBWI() && InVT.getVectorElementType() == MVT::i16) ||
----------------
"vpmovqwb" ?

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:12491-12499
@@ -12463,4 +12490,11 @@
+  // vpmovqb/w/d, vpmovdb/w, vpmovqwb
+  if ((Subtarget->hasBWI() && InVT.getVectorElementType() == MVT::i16) ||
+      (Subtarget->hasVLX() &&
+          (InVT.getVectorElementType() == MVT::i32 ||
+          (InVT.getVectorElementType() == MVT::i64))))
+    return DAG.getNode(X86ISD::VTRUNC, DL, VT, In);
+
   if (InVT.is512BitVector() || VT.getVectorElementType() == MVT::i1) {
     if (VT.getVectorElementType().getSizeInBits() >=8)
       return DAG.getNode(X86ISD::VTRUNC, DL, VT, In);
 
----------------
I see that 512 bit vectors are handled in l12497, so the added block is only useful for smaller VTRUNCs.  Two questions:
- should l12498 also check hasBWI() for WB?  Otherwise we'll get 512 bit BW VTRUNCs even without BWI, no?  (not selectable)
- should l12491-12494 also check hasVLX() when hasBWI()?  Otherwise we'll get 128/256 bit BW VTRUNCs even without VLX, no?  (Also not selectable, I think?)


So, would it make sense to replace l12491-12494 with:


```
if (Subtarget->hasVLX() && !InVT.is512BitVector() &&
    (InVT.getVectorElementType() != MVT::i16 || Subtarget->hasBWI())
```

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:15232-15234
@@ -15197,3 +15231,5 @@
 /// (vselect \p Mask, \p Op, \p PreservedSrc) for others along with the
-/// necessary casting for \p Mask when lowering masking intrinsics.
+/// necessary
+//            casting for \p Mask when lowering masking intrinsics
+//            extend  for \p Mask when \p extendMask it set
 static SDValue getVectorMaskingNode(SDValue Op, SDValue Mask,
----------------
How about:


```
necessary:
- casting ...
- extend ...
```

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:15239
@@ +15238,3 @@
+                                    SelectionDAG &DAG,
+                                    bool extendMask = false) {
+    EVT VT     = Op.getValueType();
----------------
capitalization? extend -> Extend (here and elsewhere)

But I'm not convinced it's a good idea to reuse the same function.  Perhaps it's different enough to warrant another function for masking truncstores?

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:15252-15257
@@ -15214,6 +15251,8 @@
 
-    // In case when MaskVT equals v2i1 or v4i1, low 2 or 4 elements
-    // are extracted by EXTRACT_SUBVECTOR.
-    SDValue VMask = DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl, MaskVT,
-                                DAG.getBitcast(BitcastVT, Mask),
-                                DAG.getIntPtrConstant(0, dl));
+    if (extendMask && MaskVT.bitsGT(Mask.getValueType())){
+      EVT newMaskVT =  EVT::getIntegerVT(*DAG.getContext(),
+                                         MaskVT.getSizeInBits());
+      VMask = DAG.getBitcast(MaskVT,
+                             DAG.getNode(ISD::ANY_EXTEND, dl, newMaskVT, Mask));
+    }
+    else{
----------------
Would it be wrong to do this all the time, even for non-truncstores?

So that you can remove extendMask completely, and just decide based on MaskVT/Mask.getVT() ?

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:15258
@@ +15257,3 @@
+    }
+    else{
+      EVT BitcastVT = EVT::getVectorVT(*DAG.getContext(), MVT::i1,
----------------
```
}
else{

```
->

```
} else {
```

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:15278
@@ +15277,3 @@
+      case X86ISD::VTRUNCUS:
+        OpcodeSelect = X86ISD::SELECT;
+        break;
----------------
Why? I see the patterns match X86select, but what's wrong with ISD::VSELECT?

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:15416
@@ +15415,3 @@
+      return getVectorMaskingNode(DAG.getNode(IntrData->Opc0, dl, VT, Src),
+                                  Mask, Passthru, Subtarget, DAG, true);
+    }
----------------
true -> /*ExtendMask=*/true

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:24146
@@ +24145,3 @@
+  // vpmovqb, vpmovqw, vpmovqd, vpmovdb, vpmovdw
+  if ( TLI.isTruncStoreLegal(VT, StVT)) {
+    return SDValue();
----------------
unnecessary whitespace, -> if (TLI

================
Comment at: lib/Target/X86/X86InstrAVX512.td:5786-5788
@@ +5785,5 @@
+
+defm VPMOVQB    : avx512_trunc_qb<0x32, "vpmovqb", X86vtrunc>;
+defm VPMOVSQB   : avx512_trunc_sat_qb<0x22, "s",   X86vtruncs>;
+defm VPMOVUSQB  : avx512_trunc_sat_qb<0x12, "us",  X86vtruncus>;
+
----------------
Multiclass?  !add(0x10) or something for the opcode.

================
Comment at: lib/Target/X86/X86InstrFragmentsSIMD.td:117
@@ -116,5 +116,3 @@
 
-def X86vtrunc   : SDNode<"X86ISD::VTRUNC",
-                         SDTypeProfile<1, 1, [SDTCisVec<0>, SDTCisVec<1>,
-                                              SDTCisInt<0>, SDTCisInt<1>,
-                                              SDTCisOpSmallerThanOp<0, 1>]>>;
+def SDTVtuncOp  : SDTypeProfile<1, 1, [SDTCisVec<0>, SDTCisVec<1>,
+                                       SDTCisInt<0>, SDTCisInt<1>,
----------------
Vtunc -> Vtrunc

Why "Op"?

Also, is there a better place for this?  Perhaps with the other SDTPs (or the lone one in this file)?

================
Comment at: lib/Target/X86/X86IntrinsicsInfo.h:178
@@ -141,1 +177,3 @@
+  X86_INTRINSIC_DATA(avx512_mask_pmov_wb_mem_512, TRUNCATE_TO_MEM_VI8,
+                     X86ISD::VTRUNC, 0),
   X86_INTRINSIC_DATA(avx512_scatter_dpd_512, SCATTER, X86::VSCATTERDPDZmr, 0),
----------------
What about the saturating _mem variants? If you add them, would that let you remove the avx512_trunc_mr_lowering patterns?

================
Comment at: test/CodeGen/X86/avx512-trunc-ext.ll:1
@@ -1,2 +1,2 @@
 ; RUN: llc < %s -mtriple=x86_64-apple-darwin -mcpu=knl | FileCheck %s --check-prefix=KNL
 ; RUN: llc < %s -mtriple=x86_64-apple-darwin -mcpu=skx | FileCheck %s --check-prefix=SKX 
----------------
Do you mind splitting the file in trunc and ext tests?

================
Comment at: test/CodeGen/X86/avx512-trunc-ext.ll:29
@@ +28,3 @@
+
+define <8 x i8> @trunc_qb_512(<8 x i64> %i) nounwind readnone {
+; SKX-LABEL: trunc_qb_512:
----------------
readnone isn't useful;  also, can you use an attribute group?


Repository:
  rL LLVM

http://reviews.llvm.org/D11218