[PATCH] D106280: [X86][AVX] scalar_to_vector(load_scalar()) -> load_vector() for fast dereferencable loads

Mon Jul 19 17:46:22 PDT 2021

pengfei added inline comments.

================
Comment at: llvm/test/CodeGen/X86/load-partial-dot-product.ll:183
 ; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm0[0,1],mem[0],xmm0[3]
-; AVX-NEXT:    vmovsd {{.*#+}} xmm1 = mem[0],zero
+; AVX-NEXT:    vmovups (%rsi), %xmm1
 ; AVX-NEXT:    vinsertps {{.*#+}} xmm1 = xmm1[0,1],mem[0],xmm1[3]
----------------
RKSimon wrote:
> efriedma wrote:
> > Even if we're allowed to do this, it doesn't seem wise; having zero in the high bits of the register is better than random junk.  Can we mark up the loads somehow?
> Isn't that what the dereferenceable(16) tag is doing?
I have the same doubt. `dereferenceable(16)` tells the memory of the high bits is available. But shouldn't we always prefer to loading less bytes for performance?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106280/new/

https://reviews.llvm.org/D106280