[PATCH] D77928: [X86] Enable shuffle combining for AVX512 unless the root is used by a vselect

Simon Pilgrim via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sat Apr 11 04:15:48 PDT 2020


RKSimon added a comment.

Thanks for doing this - I've been putting it off for a long time!



================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:33845
 
+  if (AllowIntDomain && EltSizeInBits == 64 &&
+      ((MaskVT.is128BitVector() && Subtarget.hasVLX()) ||
----------------
// Attempt to match against VALIGN element rotate.

Also - why not merge the 32/64 bit checks?


================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:33849
+       (MaskVT.is512BitVector() && Subtarget.hasAVX512()))) {
+    if (!any_of(Mask, [](int M) { return M == SM_SentinelZero; })) {
+      int Rotation = matchShuffleAsElementRotate(V1, V2, Mask);
----------------
Use the isAnyZero helper? (It's recent so isn't used everywhere it could be yet).


================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:34071
+  if (RootSizeInBits == 512 || (Subtarget.hasVLX() && RootSizeInBits >= 128)) {
+    if (Root.hasOneUse() && Root->use_begin()->getOpcode() == ISD::VSELECT &&
+        Root->use_begin()->getOperand(0).getScalarValueSizeInBits() == 1) {
----------------
Do we need to peek through bitcasts at all?


================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:34163
+                       (Mask[0] < 0 || Mask[2] < 0 || Mask[0] == (Mask[2] % 2)) &&
+                       (Mask[1] < 0 || Mask[3] < 0 || Mask[1] == (Mask[3] % 2));
+
----------------
It might be better to use BaseMask directly - its already dealt with the widening so the mask matching should be easier?


================
Comment at: llvm/test/CodeGen/X86/vector-shuffle-combining-avx512bw.ll:230
+; CHECK-NEXT:    vprold $16, %zmm0, %zmm0
 ; CHECK-NEXT:    ret{{[l|q]}}
   %1 = call <32 x i16> @llvm.x86.avx512.permvar.hi.512(<32 x i16> %a0, <32 x i16> <i16 1, i16 0, i16 3, i16 2, i16 4, i16 5, i16 6, i16 7, i16 9, i16 8, i16 11, i16 10, i16 12, i16 13, i16 14, i16 15, i16 17, i16 16, i16 19, i16 18, i16 20, i16 21, i16 22, i16 23, i16 25, i16 24, i16 27, i16 26, i16 28, i16 29, i16 30, i16 31>)
----------------
Maybe rename the test? combine_vpermi2var_32i16_as_rotate ? If you want to you could adjust the masks so this (or a new test) still combines to pshufb.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77928/new/

https://reviews.llvm.org/D77928





More information about the llvm-commits mailing list