[PATCH] D108253: [WIP][X86] Introduce 'blend with broadcast' shuffle lowering strategy (PR50971)

Pengfei Wang via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Sep 28 19:30:34 PDT 2021


pengfei added inline comments.


================
Comment at: llvm/test/CodeGen/X86/horizontal-sum.ll:276-278
+; AVX2-SLOW-NEXT:    vmovlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]
+; AVX2-SLOW-NEXT:    vpermilpd {{.*#+}} xmm1 = xmm1[1,0]
+; AVX2-SLOW-NEXT:    vinsertf128 $1, %xmm1, %ymm0, %ymm0
----------------
Why is these been affected?


================
Comment at: llvm/test/CodeGen/X86/vector-shuffle-256-v4.ll:704-706
+; AVX2-NEXT:    vmovddup {{.*#+}} xmm0 = xmm0[0,0]
+; AVX2-NEXT:    vbroadcastsd %xmm1, %ymm1
+; AVX2-NEXT:    vblendps {{.*#+}} ymm0 = ymm0[0,1,2,3],ymm1[4,5,6,7]
----------------
Is this a regression?


================
Comment at: llvm/test/CodeGen/X86/vector-shuffle-256-v4.ll:709-713
+; AVX512VL-LABEL: shuffle_v4f64_0044:
+; AVX512VL:       # %bb.0:
+; AVX512VL-NEXT:    vmovapd {{.*#+}} ymm2 = [0,0,4,4]
+; AVX512VL-NEXT:    vpermt2pd %ymm1, %ymm2, %ymm0
+; AVX512VL-NEXT:    retq
----------------
Not sure if there is the right direction. Does the `vmovlhps` + `vpermpd` have better performance for cases AVX512VL-SLOW and AVX512VL-FAST-PERLANE?
Besides, this does show relationship with broadcast.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D108253/new/

https://reviews.llvm.org/D108253



More information about the llvm-commits mailing list