[PATCH] D55870: [X86] Don't match TESTrr from (cmp (and X, Y), 0) during isel. Defer to post processing

Wed Dec 19 08:24:35 PST 2018

craig.topper marked 3 inline comments as done.
craig.topper added inline comments.

================
Comment at: test/CodeGen/X86/bmi.ll:531-539
 ; X64-NEXT:    movl %esi, %eax
-; X64-NEXT:    movl %edi, %ecx
-; X64-NEXT:    negl %ecx
-; X64-NEXT:    testl %edi, %ecx
+; X64-NEXT:    blsil %edi, %ecx
 ; X64-NEXT:    cmovnel %edx, %eax
 ; X64-NEXT:    retq
   %t0 = sub i32 0, %a
   %t1 = and i32 %t0, %a
   %t2 = icmp eq i32 %t1, 0
----------------
andreadb wrote:
> This is a strange/interesting test.
> 
> If %a is zero, then %t1 is also zero.
> If %a is not zero, then %t1 has exactly one bit set.
> 
> -->
> 
> Testing if %t1 is equal to 0, is equivalent to testing if %a is 0.
> 
> The only case where %t2 is TRUE, is if %a is 0.
> This whole logic could be folded into a icmp + select. So we don't even need to select a BLSI.
> 
> This sequence should be optimized at IR level. I didn't test if it is what happens.
> 
> That being said. I take that the the purpose of this test was different. Probably, this test should be rewritten in a way that doesn't expose that simplification?
> 
The tests were intended to test use the Z flag from the BMI instructions.

================
Comment at: test/CodeGen/X86/bmi.ll:624-635
 ; X64-LABEL: blsi64_z2:
 ; X64:       # %bb.0:
 ; X64-NEXT:    movq %rsi, %rax
-; X64-NEXT:    movq %rdi, %rcx
-; X64-NEXT:    negq %rcx
-; X64-NEXT:    testq %rdi, %rcx
+; X64-NEXT:    blsiq %rdi, %rcx
 ; X64-NEXT:    cmovneq %rdx, %rax
 ; X64-NEXT:    retq
   %t0 = sub i64 0, %a
----------------
andreadb wrote:
> Again. Here we may prefer POPCNT to BLSI. It tends to have better latency/throughput overall. I think it is worthy to raise a bug for this.
> 
> Speaking about these tests in general:
> I think that we should make these more robust (maybe in a separate patch). 
> 
> We can probably make this test more robust by changing how we check the result. For example, rather than comparing against zero, we can compare against a specific power-of-2. That would force the selection of BLSI, since we would need to know the position of that bit.
> 
> We can probably do something similar to improve the other test.
I thought we just established that BLSI could be replaced with a compare of the input with 0. Why would we replace it with POPCNT?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55870/new/

https://reviews.llvm.org/D55870