[PATCH] D141778: [DAGCombiner][X86] `mergeConsecutiveStores()`: support merging splat-stores of the same value
Roman Lebedev via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sun Jan 15 08:19:49 PST 2023
lebedev.ri marked an inline comment as done.
lebedev.ri added inline comments.
================
Comment at: llvm/test/CodeGen/X86/legalize-shl-vec.ll:247-252
+; X64-NEXT: movq %r8, %xmm0
+; X64-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,1,0,1]
+; X64-NEXT: psrad $31, %xmm0
+; X64-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,3,3]
+; X64-NEXT: movdqa %xmm0, 16(%rdi)
+; X64-NEXT: movdqa %xmm0, (%rdi)
----------------
lebedev.ri wrote:
> pengfei wrote:
> > Looks like regression here.
> We've traded 4 scalar stores to two vector stores + GPR->XMM xfer + two shuffles + shift.
> I wouldn't say it's an obvious regression, since we get less contention in CPU store unit,
> but it's not really an improvement, yes.
>
> Can you help spot issues in the tests in `elementwise-store-of-scalar-splat.ll` / `subvectorwise-store-of-vector-splat.ll`?
> If those do not have regressions, then we need to restrict some other fold.
This is an unrelated shuffle combining issue
```
movq %r8, %xmm0
pshufd $68, %xmm0, %xmm0 # xmm0 = xmm0[0,1,0,1]
psrad $31, %xmm0
pshufd $245, %xmm0, %xmm0 # xmm0 = xmm0[1,1,3,3]
```
What we want here, is to splat the sign bit of a 64-bit scalar to the entire XMM.
This should just be: (note the decoded shuffle mask)
```
movq %r8, %xmm0
pshufd $<???>, %xmm0, %xmm0 # xmm0 = xmm0[1,1,1,1]
psrad $31, %xmm0
```
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D141778/new/
https://reviews.llvm.org/D141778
More information about the llvm-commits
mailing list