[PATCH] D46957: [x86] Lower some trunc + shuffle patterns to vpmov[q|d][b|w]

Mon Jun 11 06:06:01 PDT 2018

GBuella added inline comments.

================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:9483
+  if (OriginalVector.getSimpleValueType().getVectorElementType() == MVT::i16 &&
+      VT == MVT::v16i8 && !Subtarget.hasBWI())
+    return SDValue();
----------------
RKSimon wrote:
> GBuella wrote:
> > RKSimon wrote:
> > > Why is it just 16i8 and not 32i8 as well for _mm512_cvtepi16_epi8 ?
> > This part is only about truncations, where the result must be filled with extra zeros, due to the (narrower tan 128bits) result being in an xmm register.
> > The  _mm512_cvtepi16_epi8 one truncates from a 512bit vector into a 256bit vector, that is already recognized without this patch.
> > 
> > The check here is about _mm_cvtepi16_epi8  (which requires avx512vl & avx512bw). It truncates from v8i16 -> v8i8, but the vpmovwb instruction actually sets a whole xmm register, so the actual result is going to be v16i8, with other 8 bytes set to zero.
> > 
> > Ok, perhaps these details should be explained in comments around here.
> Shouldn't it handle this case? https://godbolt.org/g/Yxw7nE
Probably that could also be implemented here, we just didn't think about it so far.
There is/was a patch for those using __builtin_convertvector 
https://reviews.llvm.org/D46742
This patch was originally intended to handle these cases, which can't be don with __builtin_convertvector.

But if it is not a lot of extra work, the shufflevector equivalents of those convertvector ones could be detected here.

https://reviews.llvm.org/D46957