[PATCH] D86093: [X86][AVX] Lower v16i8/v8i16 shuffles using VTRUNC/TRUNCATE

Mon Aug 17 12:36:04 PDT 2020

RKSimon added a comment.

In D86093#2222047 <https://reviews.llvm.org/D86093#2222047>, @craig.topper wrote:

> It looks like we may have already doing it in some cases, but is a VTRUNC for xmm->xmm really better the VPSHUFB? VTRUNC is 2 port 5 uops. VPSHUFB is 1 port 5 uop.

That sounds reasonable to me - although there's the inevitable question of the cost of loading the shuffle mask - I'll limit it to just binary shuffles, which is the cause of the regressions in D66004 <https://reviews.llvm.org/D66004>.

================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:11399
+  unsigned EltSizeInBits = VT.getScalarSizeInBits();
+  if (Mask.size() != NumElts)
+    return SDValue();
----------------
craig.topper wrote:
> When does this condition happen? Doesn't Mask always follow VT in shuffle lowering?
This is just copy+paste from lowerShuffleWithVPMOV - neither actually need it

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86093/new/

https://reviews.llvm.org/D86093