[PATCH] D106280: [X86][AVX] scalar_to_vector(load_scalar()) -> load_vector() for fast dereferencable loads
Pengfei Wang via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Jul 20 06:10:01 PDT 2021
pengfei added inline comments.
================
Comment at: llvm/test/CodeGen/X86/load-partial-dot-product.ll:183
; AVX-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0,1],mem[0],xmm0[3]
-; AVX-NEXT: vmovsd {{.*#+}} xmm1 = mem[0],zero
+; AVX-NEXT: vmovups (%rsi), %xmm1
; AVX-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,1],mem[0],xmm1[3]
----------------
lebedev.ri wrote:
> pengfei wrote:
> > RKSimon wrote:
> > > efriedma wrote:
> > > > Even if we're allowed to do this, it doesn't seem wise; having zero in the high bits of the register is better than random junk. Can we mark up the loads somehow?
> > > Isn't that what the dereferenceable(16) tag is doing?
> > I have the same doubt. `dereferenceable(16)` tells the memory of the high bits is available. But shouldn't we always prefer to loading less bytes for performance?
> You are comparing apples to oranges here.
> The problem here is that `vinsertps` is (obviously) redundant and should go away.
> Then it's obviously better - we have one less memory access.
I see it now. It makes sense if it wants to turn
```
vmovsd {{.*#+}} xmm0 = mem[0],zero
vinsertps {{.*#+}} xmm0 = xmm0[0,1],mem[0],xmm0[3]
```
into
`vmovups (%rdi), %xmm0`
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D106280/new/
https://reviews.llvm.org/D106280
More information about the llvm-commits
mailing list