[PATCH] D56756: [x86] lower shuffle of extracts to AVX2 vperm instructions

Wed Jan 16 01:51:37 PST 2019

RKSimon accepted this revision.
RKSimon added a comment.
This revision is now accepted and ready to land.

LGTM with a few minors

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:11683
+  unsigned ExtIndex0 = N0.getConstantOperandVal(1);
+  unsigned ExtIndex1 = N1.getConstantOperandVal(1);
+  if (ExtIndex1 == 0 && ExtIndex0 == NumElts) {
----------------
Really these should be safe now, but it might be better to use:
```
SDValue ExtIdx0 = N0.getOperand(1);
const APInt &ExtIndex0 = cast<ConstantSDNode>(ExtIdx0)->getAPIntValue();
```
I keep wondering whether we should have made getConstantOperandVal return APInt& 

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:11698
+  for (unsigned i = 0; i != NumElts; ++i)
+    NewMask.push_back(-1);
+
----------------
NewMask.append(NumElts, -1);

================
Comment at: test/CodeGen/X86/avx512-shuffles/partial_permute.ll:1052
 ; CHECK-NEXT:    vptestnmd %xmm1, %xmm1, %k1
-; CHECK-NEXT:    vpshufd {{.*#+}} xmm0 {%k1} {z} = xmm0[1,3,2,1]
+; CHECK-NEXT:    vmovdqa32 %xmm0, %xmm0 {%k1} {z}
 ; CHECK-NEXT:    vzeroupper
----------------
This looks like a missed opportunity to merge a avx512 mask select across an extract_subvector(vec,0) ? There a few more below as well, please can you raise a bug about them.
```
vmovdqa {{.*#+}} ymm2 = <5,3,2,5,u,u,u,u>
vptestnmd %xmm1, %xmm1, %k1
vpermd %ymm0, %ymm2, %ymm0 {%k1} {z}
```

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D56756/new/

https://reviews.llvm.org/D56756