[PATCH] D115180: [X86] Enable v16i8/v32i8/v64i8 rotation on AVX512 targets

Mon Dec 13 06:29:14 PST 2021

pengfei added inline comments.

================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:29873-29874
     MVT ExtVT = MVT::getVectorVT(MVT::i16, NumElts / 2);
-
-    // If the amount is a splat, attempt to fold as unpack(x,x) << zext(y):
+    MVT WideVT =
+        MVT::getVectorVT(Subtarget.hasBWI() ? MVT::i16 : MVT::i32, NumElts);
+    unsigned ShiftOpc = HiddenROTRAmt ? ISD::SRL : ISD::SHL;
----------------
The `NumElts` may be 64 or 32 to the maximum, so the `WideVT` can be illegal one?

================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:29880-29882
+    // Attempt to fold as unpack(x,x) << zext(y):
     // rotl(x,y) -> (((aext(x) << bw) | zext(x)) << (y & (bw-1))) >> bw.
     // rotr(x,y) -> (((aext(x) << bw) | zext(x)) >> (y & (bw-1))).
----------------
If the comments are usable for all if and else conditions, the second half comment in 29884 seems redundant. Or the comments should be moved to the if block?

================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:29893
       return getPack(DAG, Subtarget, DL, VT, Lo, Hi, !HiddenROTRAmt);
+    } else if (supportedVectorVarShift(WideVT, Subtarget, ShiftOpc) && 
+               supportedVectorShiftWithImm(WideVT, Subtarget, ShiftOpc)) {
----------------
There's extra space here.

================
Comment at: llvm/test/CodeGen/X86/vector-fshr-rot-128.ll:626
 ; AVX512F-NEXT:    vpmovzxbd {{.*#+}} zmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero,xmm0[4],zero,zero,zero,xmm0[5],zero,zero,zero,xmm0[6],zero,zero,zero,xmm0[7],zero,zero,zero,xmm0[8],zero,zero,zero,xmm0[9],zero,zero,zero,xmm0[10],zero,zero,zero,xmm0[11],zero,zero,zero,xmm0[12],zero,zero,zero,xmm0[13],zero,zero,zero,xmm0[14],zero,zero,zero,xmm0[15],zero,zero,zero
-; AVX512F-NEXT:    vpsrlvd %zmm3, %zmm0, %zmm3
-; AVX512F-NEXT:    vpxor %xmm4, %xmm4, %xmm4
-; AVX512F-NEXT:    vpsubb %xmm1, %xmm4, %xmm1
-; AVX512F-NEXT:    vpand %xmm2, %xmm1, %xmm1
+; AVX512F-NEXT:    vpslld $8, %zmm0, %zmm2
+; AVX512F-NEXT:    vpord %zmm2, %zmm0, %zmm0
----------------
Why do we still widen here rather that use avx instruction like other cases?

================
Comment at: llvm/test/CodeGen/X86/vector-fshr-rot-512.ll:859
+; AVX512BW-NEXT:    vpsrlw $4, %zmm0, %zmm0
 ; AVX512BW-NEXT:    vpternlogq $216, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %zmm1, %zmm0
 ; AVX512BW-NEXT:    retq
----------------
Didn't dig much, why the immediate value keep unchanged while zmm1 and zmm0 swapped?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115180/new/

https://reviews.llvm.org/D115180