[PATCH] D105390: [X86] Lower insertions into upper half of an 256-bit vector as broadcast+blend (PR50971)

Mon Jul 19 10:06:10 PDT 2021

lebedev.ri added inline comments.

================
Comment at: llvm/test/CodeGen/X86/masked_gather.ll:1306
+; AVX1-NEXT:    vbroadcastss c+28(%rip), %ymm2
+; AVX1-NEXT:    vblendps {{.*#+}} ymm0 = ymm0[0,1,2,3],ymm2[4],ymm0[5,6,7]
 ; AVX1-NEXT:  .LBB4_42: # %else87
----------------
RKSimon wrote:
> lebedev.ri wrote:
> > RKSimon wrote:
> > > Just noticed this on D106280 - I don't suppose you know why we fail to merge these identical broadcasts?
> > I'm not sure i follow. this inserts `c+28(%rip)` into the 4'th 32-bit element of ymm0.
> > How/what would expect it to look like?
> Aren't all the "broadcastss c+28(%rip), XXXX" cases broadcasting the same memory location? The IR looks like the gep is splatting the element 3 of the pointer array to every gather address.
Right. Well, i'm not sure where we'd do that. And what do you mean by merge?

They are scalarized by `Scalarize Masked Memory Intrinsics (scalarize-masked-mem-intrin)` pass,
which is a codegen pass, I'm not sure how we could do that in DAGCombine,
since we only have a single bb at a time, and we don't have any heavy-lifting passes this late.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105390/new/

https://reviews.llvm.org/D105390