[PATCH] D108253: [WIP][X86] Introduce 'blend with broadcast' shuffle lowering strategy (PR50971)
Pengfei Wang via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Sep 28 19:30:34 PDT 2021
pengfei added inline comments.
================
Comment at: llvm/test/CodeGen/X86/horizontal-sum.ll:276-278
+; AVX2-SLOW-NEXT: vmovlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]
+; AVX2-SLOW-NEXT: vpermilpd {{.*#+}} xmm1 = xmm1[1,0]
+; AVX2-SLOW-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
----------------
Why is these been affected?
================
Comment at: llvm/test/CodeGen/X86/vector-shuffle-256-v4.ll:704-706
+; AVX2-NEXT: vmovddup {{.*#+}} xmm0 = xmm0[0,0]
+; AVX2-NEXT: vbroadcastsd %xmm1, %ymm1
+; AVX2-NEXT: vblendps {{.*#+}} ymm0 = ymm0[0,1,2,3],ymm1[4,5,6,7]
----------------
Is this a regression?
================
Comment at: llvm/test/CodeGen/X86/vector-shuffle-256-v4.ll:709-713
+; AVX512VL-LABEL: shuffle_v4f64_0044:
+; AVX512VL: # %bb.0:
+; AVX512VL-NEXT: vmovapd {{.*#+}} ymm2 = [0,0,4,4]
+; AVX512VL-NEXT: vpermt2pd %ymm1, %ymm2, %ymm0
+; AVX512VL-NEXT: retq
----------------
Not sure if there is the right direction. Does the `vmovlhps` + `vpermpd` have better performance for cases AVX512VL-SLOW and AVX512VL-FAST-PERLANE?
Besides, this does show relationship with broadcast.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D108253/new/
https://reviews.llvm.org/D108253
More information about the llvm-commits
mailing list