[PATCH] D141778: [DAGCombiner][X86] `mergeConsecutiveStores()`: support merging splat-stores of the same value

Phoebe Wang via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sun Jan 15 19:28:24 PST 2023


pengfei added inline comments.


================
Comment at: llvm/test/CodeGen/X86/legalize-shl-vec.ll:247-252
+; X64-NEXT:    movq %r8, %xmm0
+; X64-NEXT:    pshufd {{.*#+}} xmm0 = xmm0[0,1,0,1]
+; X64-NEXT:    psrad $31, %xmm0
+; X64-NEXT:    pshufd {{.*#+}} xmm0 = xmm0[1,1,3,3]
+; X64-NEXT:    movdqa %xmm0, 16(%rdi)
+; X64-NEXT:    movdqa %xmm0, (%rdi)
----------------
lebedev.ri wrote:
> lebedev.ri wrote:
> > lebedev.ri wrote:
> > > pengfei wrote:
> > > > Looks like regression here.
> > > We've traded 4 scalar stores to two vector stores + GPR->XMM xfer + two shuffles + shift.
> > > I wouldn't say it's an obvious regression, since we get less contention in CPU store unit,
> > > but it's not really an improvement, yes.
> > > 
> > > Can you help spot issues in the tests in `elementwise-store-of-scalar-splat.ll` / `subvectorwise-store-of-vector-splat.ll`?
> > > If those do not have regressions, then we need to restrict some other fold.
> > This is an unrelated shuffle combining issue
> > ```
> >         movq    %r8, %xmm0
> >         pshufd  $68, %xmm0, %xmm0               # xmm0 = xmm0[0,1,0,1]
> >         psrad   $31, %xmm0
> >         pshufd  $245, %xmm0, %xmm0              # xmm0 = xmm0[1,1,3,3]
> > ```
> > What we want here, is to splat the sign bit of a 64-bit scalar to the entire XMM.
> > This should just be: (note the decoded shuffle mask)
> > ```
> >         movq    %r8, %xmm0
> >         pshufd  $<???>, %xmm0, %xmm0            # xmm0 = xmm0[1,1,1,1]
> >         psrad   $31, %xmm0
> > 
> > ```
> https://reviews.llvm.org/D141806 will address this.
> Does anyone see any other regressions?
> Can you help spot issues in the tests in `elementwise-store-of-scalar-splat.ll` / `subvectorwise-store-of-vector-splat.ll`?

I like the AVX512 broadcast version, i.e., GPR->XMM directly. AVX2 and some AVX512F is suboptimal. I cannot tell whether it's good or not to replace splat store with shufle on SSE2. Should a `rep stos` be better?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141778/new/

https://reviews.llvm.org/D141778



More information about the llvm-commits mailing list