[PATCH] D111960: [X86][AVX] Prefer VINSERTF128 over VPERM2F128 for 128->256 subvector concatenations

Sun Oct 17 08:08:47 PDT 2021

lebedev.ri added inline comments.

================
Comment at: llvm/test/CodeGen/X86/pr50823.ll:11-13
+; CHECK-NEXT:    vmovups (%rsi), %ymm0
+; CHECK-NEXT:    vinsertf128 $1, 32(%rsi), %ymm0, %ymm0
+; CHECK-NEXT:    vhaddps %ymm0, %ymm0, %ymm0
----------------
RKSimon wrote:
> pengfei wrote:
> > Is this a regression?
> I don't believe so: https://simd.godbolt.org/z/rhrqsss5a - as I said in the summary, vinsertX128 tends to be cheaper than more general cross-lane shuffles.
We were loading 128 bits, and then fold-loading 128 more bits,
and now we load 256 bits, and then fold-load high 128 bits we just loaded, no?
The `vinsertf128` should be dropped because it is a no-op.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D111960/new/

https://reviews.llvm.org/D111960