[PATCH] D35320: [X86][SSE] Add support for extending bool vectors bitcasted from scalars.

Tue Jul 18 02:34:45 PDT 2017

RKSimon added inline comments.

================
Comment at: test/CodeGen/X86/bitcast-int-to-vector-bool.ll:24
+; AVX1:       # BB#0:
+; AVX1-NEXT:    movzbl %dil, %eax
+; AVX1-NEXT:    vmovq %rax, %xmm0
----------------
delena wrote:
> One "mov" is enough here. you, probably, use "zext" instead of "anyext". (I did not look at the code yet)
The code requested an ANY_EXTEND but for some reason its choosing to do a movz, probably partial register logic kicking in.

================
Comment at: test/CodeGen/X86/bitcast-int-to-vector-bool.ll:120
+; AVX1-NEXT:    vmovd %edi, %xmm0
+; AVX1-NEXT:    vpshuflw {{.*#+}} xmm0 = xmm0[0,0,0,0,4,5,6,7]
+; AVX1-NEXT:    vpshufd {{.*#+}} xmm0 = xmm0[0,0,1,1]
----------------
delena wrote:
> pshufb can be used for broadcasting i8.
Yes, we don't normally combine to pshufb until depth=3 due to size/load costs of a shuffle mask, but in this case we just need a (free-ish) zero shuffle mask so it'd be safe to support it. Any additional shuffle combine would then take us over the depth limit.

================
Comment at: test/CodeGen/X86/bitcast-int-to-vector-bool.ll:133
+; AVX2-NEXT:    vmovdqa {{.*#+}} xmm1 = [1,2,4,8,16,32,64,128]
+; AVX2-NEXT:    vpand %xmm1, %xmm0, %xmm0
+; AVX2-NEXT:    vpcmpeqw %xmm1, %xmm0, %xmm0
----------------
delena wrote:
> why load is not folded in vpand?
xmm1 is used twice - both as the mask and then with the comparison below ((X & M) == M)

Repository:
  rL LLVM

https://reviews.llvm.org/D35320