[PATCH] D35320: [X86][SSE] Add support for extending bool vectors bitcasted from scalars.
Simon Pilgrim via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Jul 18 02:34:45 PDT 2017
RKSimon added inline comments.
================
Comment at: test/CodeGen/X86/bitcast-int-to-vector-bool.ll:24
+; AVX1: # BB#0:
+; AVX1-NEXT: movzbl %dil, %eax
+; AVX1-NEXT: vmovq %rax, %xmm0
----------------
delena wrote:
> One "mov" is enough here. you, probably, use "zext" instead of "anyext". (I did not look at the code yet)
The code requested an ANY_EXTEND but for some reason its choosing to do a movz, probably partial register logic kicking in.
================
Comment at: test/CodeGen/X86/bitcast-int-to-vector-bool.ll:120
+; AVX1-NEXT: vmovd %edi, %xmm0
+; AVX1-NEXT: vpshuflw {{.*#+}} xmm0 = xmm0[0,0,0,0,4,5,6,7]
+; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,0,1,1]
----------------
delena wrote:
> pshufb can be used for broadcasting i8.
Yes, we don't normally combine to pshufb until depth=3 due to size/load costs of a shuffle mask, but in this case we just need a (free-ish) zero shuffle mask so it'd be safe to support it. Any additional shuffle combine would then take us over the depth limit.
================
Comment at: test/CodeGen/X86/bitcast-int-to-vector-bool.ll:133
+; AVX2-NEXT: vmovdqa {{.*#+}} xmm1 = [1,2,4,8,16,32,64,128]
+; AVX2-NEXT: vpand %xmm1, %xmm0, %xmm0
+; AVX2-NEXT: vpcmpeqw %xmm1, %xmm0, %xmm0
----------------
delena wrote:
> why load is not folded in vpand?
xmm1 is used twice - both as the mask and then with the comparison below ((X & M) == M)
Repository:
rL LLVM
https://reviews.llvm.org/D35320
More information about the llvm-commits
mailing list