[PATCH] D105390: [X86] Lower insertions into upper half of an 256-bit vector as broadcast+blend (PR50971)

Tue Jul 13 05:29:09 PDT 2021

RKSimon added inline comments.

================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:37554
+            Src.getOperand(0).getValueType()))
+      return DAG.getNode(X86ISD::VBROADCAST, DL, VT, Src.getOperand(0));
+
----------------
do we get any changes in current tests if we pull this out as a preliminary patch?

================
Comment at: llvm/test/CodeGen/X86/avx512-insert-extract.ll:669
+; SKX-NEXT:    vmovdqa {{.*#+}} ymm0 = [0,1,2,3,4,5,6,7,8,25,10,11,12,13,14,15]
+; SKX-NEXT:    vpermi2w %ymm2, %ymm1, %ymm0
+; SKX-NEXT:    retq
----------------
lebedev.ri wrote:
> craig.topper wrote:
> > vpermi2w is 3 uops, 2 of which are 3 cycles that are serialized. I think the two blends we got on avx2 would be better. That's probably a separate issue in shuffle lowering/combining.
> Right. This is a separate problem, in `combineX86ShufflesRecursively()` i would guess.
The 'AllowBWIVPERMV3' logic in combineX86ShuffleChain is probably slightly off.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105390/new/

https://reviews.llvm.org/D105390