[PATCH] D37849: [SelectionDAG] Add BITCAST handling to ComputeNumSignBits for splatted sign bits.

Sun Sep 17 05:52:17 PDT 2017

delena added inline comments.

================
Comment at: test/CodeGen/X86/bitcast-and-setcc-256.ll:380
 ; AVX12:       # BB#0:
 ; AVX12-NEXT:    vcmpltps %ymm0, %ymm1, %ymm0
 ; AVX12-NEXT:    vextractf128 $1, %ymm0, %xmm1
----------------
RKSimon wrote:
> delena wrote:
> > This code fits in 4 instructions:
> > cmp
> > cmp
> > and
> > pmovmskpd
> > 
> > What happens without "and", just cmp + bitcast ?
> The codegen below is from the bitcast-setcc-*.ll sibling test files, there's still plenty of improvements possible with bool vectors, it's just taking a long time. D35320 should help as well of course.
> ```
> ; AVX1-LABEL: v8f32:
> ; AVX1:       # BB#0:
> ; AVX1-NEXT:    vcmpltps %ymm0, %ymm1, %ymm0
> ; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm1
> ; AVX1-NEXT:    vpacksswb %xmm1, %xmm0, %xmm0
> ; AVX1-NEXT:    vpshufb {{.*#+}} xmm0 = xmm0[0,2,4,6,8,10,12,14,u,u,u,u,u,u,u,u]
> ; AVX1-NEXT:    vpmovmskb %xmm0, %eax
> ; AVX1-NEXT:    # kill: %AL<def> %AL<kill> %EAX<kill>
> ; AVX1-NEXT:    vzeroupper
> ; AVX1-NEXT:    retq
> ;
> ; AVX2-LABEL: v8f32:
> ; AVX2:       # BB#0:
> ; AVX2-NEXT:    vcmpltps %ymm0, %ymm1, %ymm0
> ; AVX2-NEXT:    vmovmskps %ymm0, %eax
> ; AVX2-NEXT:    # kill: %AL<def> %AL<kill> %EAX<kill>
> ; AVX2-NEXT:    vzeroupper
> ; AVX2-NEXT:    retq
> ;
> ; AVX512-LABEL: v8f32:
> ; AVX512:       # BB#0:
> ; AVX512-NEXT:    vcmpltps %ymm0, %ymm1, %k0
> ; AVX512-NEXT:    kmovd %k0, %eax
> ; AVX512-NEXT:    # kill: %AL<def> %AL<kill> %EAX<kill>
> ; AVX512-NEXT:    vzeroupper
> ; AVX512-NEXT:    retq
>   %x = fcmp ogt <8 x float> %a, %b
>   %res = bitcast <8 x i1> %x to i8
>   ret i8 %res
> }
> ```
Agree.  

================
Comment at: test/CodeGen/X86/psubus.ll:521
 ; AVX2-NEXT:    vpermq {{.*#+}} ymm0 = ymm0[0,2,2,3]
-; AVX2-NEXT:    vpandn %xmm0, %xmm2, %xmm0
+; AVX2-NEXT:    vpxor %xmm1, %xmm1, %xmm1
+; AVX2-NEXT:    vpblendvb %xmm2, %xmm1, %xmm0, %xmm0
----------------
Theoretically, your patch does the right thing. Did you check why an additional instruction here?

Repository:
  rL LLVM

https://reviews.llvm.org/D37849