[PATCH] D11218: AVX512 : Integer Truncate with/without saturation support
Ahmed Bougacha
ahmed.bougacha at gmail.com
Wed Jul 15 07:49:57 PDT 2015
ab added a subscriber: ab.
================
Comment at: include/llvm/IR/IntrinsicsX86.td:5294
@@ +5293,3 @@
+let TargetPrefix = "x86" in {
+ def int_x86_avx512_mask_pmov_qb_128 :
+ GCCBuiltin<"__builtin_ia32_pmovqb128_mask">,
----------------
So, this file is starting to get pretty unwieldy. Why not use multiclasses?
================
Comment at: include/llvm/Target/TargetSelectionDAG.td:494
@@ -493,3 +493,3 @@
-def masked_store : SDNode<"ISD::MSTORE", SDTMaskedStore,
+def mst : SDNode<"ISD::MSTORE", SDTMaskedStore,
[SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
----------------
I'm not a fan of this intermediate state. Can you add the same comment that's above ld/st ("Don't use this directly..."), and/or move this somewhere else?
Basically I'd prefer masked_load and masked_store to be adjacent.
================
Comment at: include/llvm/Target/TargetSelectionDAG.td:495
@@ -495,2 +494,3 @@
+def mst : SDNode<"ISD::MSTORE", SDTMaskedStore,
[SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
def masked_load : SDNode<"ISD::MLOAD", SDTMaskedLoad,
----------------
Indentation?
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:1371-1372
@@ +1370,4 @@
+ setTruncStoreAction(MVT::v32i16, MVT::v32i8, Legal);
+ setTruncStoreAction(MVT::v16i16, MVT::v16i8, Legal);
+ setTruncStoreAction(MVT::v8i16, MVT::v8i8, Legal);
+
----------------
>From my understanding, BWI doesn't imply VLX (even though SKX has both). So, shouldn't these be predicated by hasVLX() as well? I mention the same issue in another comment, so I might be missing something.
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:12490
@@ +12489,3 @@
+
+ // vpmovqb/w/d, vpmovdb/w, vpmovqwb
+ if ((Subtarget->hasBWI() && InVT.getVectorElementType() == MVT::i16) ||
----------------
"vpmovqwb" ?
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:12491-12499
@@ -12463,4 +12490,11 @@
+ // vpmovqb/w/d, vpmovdb/w, vpmovqwb
+ if ((Subtarget->hasBWI() && InVT.getVectorElementType() == MVT::i16) ||
+ (Subtarget->hasVLX() &&
+ (InVT.getVectorElementType() == MVT::i32 ||
+ (InVT.getVectorElementType() == MVT::i64))))
+ return DAG.getNode(X86ISD::VTRUNC, DL, VT, In);
+
if (InVT.is512BitVector() || VT.getVectorElementType() == MVT::i1) {
if (VT.getVectorElementType().getSizeInBits() >=8)
return DAG.getNode(X86ISD::VTRUNC, DL, VT, In);
----------------
I see that 512 bit vectors are handled in l12497, so the added block is only useful for smaller VTRUNCs. Two questions:
- should l12498 also check hasBWI() for WB? Otherwise we'll get 512 bit BW VTRUNCs even without BWI, no? (not selectable)
- should l12491-12494 also check hasVLX() when hasBWI()? Otherwise we'll get 128/256 bit BW VTRUNCs even without VLX, no? (Also not selectable, I think?)
So, would it make sense to replace l12491-12494 with:
```
if (Subtarget->hasVLX() && !InVT.is512BitVector() &&
(InVT.getVectorElementType() != MVT::i16 || Subtarget->hasBWI())
```
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:15232-15234
@@ -15197,3 +15231,5 @@
/// (vselect \p Mask, \p Op, \p PreservedSrc) for others along with the
-/// necessary casting for \p Mask when lowering masking intrinsics.
+/// necessary
+// casting for \p Mask when lowering masking intrinsics
+// extend for \p Mask when \p extendMask it set
static SDValue getVectorMaskingNode(SDValue Op, SDValue Mask,
----------------
How about:
```
necessary:
- casting ...
- extend ...
```
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:15239
@@ +15238,3 @@
+ SelectionDAG &DAG,
+ bool extendMask = false) {
+ EVT VT = Op.getValueType();
----------------
capitalization? extend -> Extend (here and elsewhere)
But I'm not convinced it's a good idea to reuse the same function. Perhaps it's different enough to warrant another function for masking truncstores?
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:15252-15257
@@ -15214,6 +15251,8 @@
- // In case when MaskVT equals v2i1 or v4i1, low 2 or 4 elements
- // are extracted by EXTRACT_SUBVECTOR.
- SDValue VMask = DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl, MaskVT,
- DAG.getBitcast(BitcastVT, Mask),
- DAG.getIntPtrConstant(0, dl));
+ if (extendMask && MaskVT.bitsGT(Mask.getValueType())){
+ EVT newMaskVT = EVT::getIntegerVT(*DAG.getContext(),
+ MaskVT.getSizeInBits());
+ VMask = DAG.getBitcast(MaskVT,
+ DAG.getNode(ISD::ANY_EXTEND, dl, newMaskVT, Mask));
+ }
+ else{
----------------
Would it be wrong to do this all the time, even for non-truncstores?
So that you can remove extendMask completely, and just decide based on MaskVT/Mask.getVT() ?
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:15258
@@ +15257,3 @@
+ }
+ else{
+ EVT BitcastVT = EVT::getVectorVT(*DAG.getContext(), MVT::i1,
----------------
```
}
else{
```
->
```
} else {
```
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:15278
@@ +15277,3 @@
+ case X86ISD::VTRUNCUS:
+ OpcodeSelect = X86ISD::SELECT;
+ break;
----------------
Why? I see the patterns match X86select, but what's wrong with ISD::VSELECT?
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:15416
@@ +15415,3 @@
+ return getVectorMaskingNode(DAG.getNode(IntrData->Opc0, dl, VT, Src),
+ Mask, Passthru, Subtarget, DAG, true);
+ }
----------------
true -> /*ExtendMask=*/true
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:24146
@@ +24145,3 @@
+ // vpmovqb, vpmovqw, vpmovqd, vpmovdb, vpmovdw
+ if ( TLI.isTruncStoreLegal(VT, StVT)) {
+ return SDValue();
----------------
unnecessary whitespace, -> if (TLI
================
Comment at: lib/Target/X86/X86InstrAVX512.td:5786-5788
@@ +5785,5 @@
+
+defm VPMOVQB : avx512_trunc_qb<0x32, "vpmovqb", X86vtrunc>;
+defm VPMOVSQB : avx512_trunc_sat_qb<0x22, "s", X86vtruncs>;
+defm VPMOVUSQB : avx512_trunc_sat_qb<0x12, "us", X86vtruncus>;
+
----------------
Multiclass? !add(0x10) or something for the opcode.
================
Comment at: lib/Target/X86/X86InstrFragmentsSIMD.td:117
@@ -116,5 +116,3 @@
-def X86vtrunc : SDNode<"X86ISD::VTRUNC",
- SDTypeProfile<1, 1, [SDTCisVec<0>, SDTCisVec<1>,
- SDTCisInt<0>, SDTCisInt<1>,
- SDTCisOpSmallerThanOp<0, 1>]>>;
+def SDTVtuncOp : SDTypeProfile<1, 1, [SDTCisVec<0>, SDTCisVec<1>,
+ SDTCisInt<0>, SDTCisInt<1>,
----------------
Vtunc -> Vtrunc
Why "Op"?
Also, is there a better place for this? Perhaps with the other SDTPs (or the lone one in this file)?
================
Comment at: lib/Target/X86/X86IntrinsicsInfo.h:178
@@ -141,1 +177,3 @@
+ X86_INTRINSIC_DATA(avx512_mask_pmov_wb_mem_512, TRUNCATE_TO_MEM_VI8,
+ X86ISD::VTRUNC, 0),
X86_INTRINSIC_DATA(avx512_scatter_dpd_512, SCATTER, X86::VSCATTERDPDZmr, 0),
----------------
What about the saturating _mem variants? If you add them, would that let you remove the avx512_trunc_mr_lowering patterns?
================
Comment at: test/CodeGen/X86/avx512-trunc-ext.ll:1
@@ -1,2 +1,2 @@
; RUN: llc < %s -mtriple=x86_64-apple-darwin -mcpu=knl | FileCheck %s --check-prefix=KNL
; RUN: llc < %s -mtriple=x86_64-apple-darwin -mcpu=skx | FileCheck %s --check-prefix=SKX
----------------
Do you mind splitting the file in trunc and ext tests?
================
Comment at: test/CodeGen/X86/avx512-trunc-ext.ll:29
@@ +28,3 @@
+
+define <8 x i8> @trunc_qb_512(<8 x i64> %i) nounwind readnone {
+; SKX-LABEL: trunc_qb_512:
----------------
readnone isn't useful; also, can you use an attribute group?
Repository:
rL LLVM
http://reviews.llvm.org/D11218
More information about the llvm-commits
mailing list