[PATCH] D105390: [X86] Lower insertions into upper half of an 256-bit vector as broadcast+blend (PR50971)

Sun Jul 25 06:23:05 PDT 2021

RKSimon added inline comments.

================
Comment at: llvm/test/CodeGen/X86/avx512-insert-extract.ll:669
+; SKX-NEXT:    vmovdqa {{.*#+}} ymm0 = [0,1,2,3,4,5,6,7,8,25,10,11,12,13,14,15]
+; SKX-NEXT:    vpermi2w %ymm2, %ymm1, %ymm0
+; SKX-NEXT:    retq
----------------
RKSimon wrote:
> lebedev.ri wrote:
> > craig.topper wrote:
> > > vpermi2w is 3 uops, 2 of which are 3 cycles that are serialized. I think the two blends we got on avx2 would be better. That's probably a separate issue in shuffle lowering/combining.
> > Right. This is a separate problem, in `combineX86ShufflesRecursively()` i would guess.
> The 'AllowBWIVPERMV3' logic in combineX86ShuffleChain is probably slightly off.
rG15b883f45771 should address this

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105390/new/

https://reviews.llvm.org/D105390