[PATCH] D55870: [X86] Don't match TESTrr from (cmp (and X, Y), 0) during isel. Defer to post processing
Craig Topper via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Dec 19 08:24:35 PST 2018
craig.topper marked 3 inline comments as done.
craig.topper added inline comments.
================
Comment at: test/CodeGen/X86/bmi.ll:531-539
; X64-NEXT: movl %esi, %eax
-; X64-NEXT: movl %edi, %ecx
-; X64-NEXT: negl %ecx
-; X64-NEXT: testl %edi, %ecx
+; X64-NEXT: blsil %edi, %ecx
; X64-NEXT: cmovnel %edx, %eax
; X64-NEXT: retq
%t0 = sub i32 0, %a
%t1 = and i32 %t0, %a
%t2 = icmp eq i32 %t1, 0
----------------
andreadb wrote:
> This is a strange/interesting test.
>
> If %a is zero, then %t1 is also zero.
> If %a is not zero, then %t1 has exactly one bit set.
>
> -->
>
> Testing if %t1 is equal to 0, is equivalent to testing if %a is 0.
>
> The only case where %t2 is TRUE, is if %a is 0.
> This whole logic could be folded into a icmp + select. So we don't even need to select a BLSI.
>
> This sequence should be optimized at IR level. I didn't test if it is what happens.
>
> That being said. I take that the the purpose of this test was different. Probably, this test should be rewritten in a way that doesn't expose that simplification?
>
The tests were intended to test use the Z flag from the BMI instructions.
================
Comment at: test/CodeGen/X86/bmi.ll:624-635
; X64-LABEL: blsi64_z2:
; X64: # %bb.0:
; X64-NEXT: movq %rsi, %rax
-; X64-NEXT: movq %rdi, %rcx
-; X64-NEXT: negq %rcx
-; X64-NEXT: testq %rdi, %rcx
+; X64-NEXT: blsiq %rdi, %rcx
; X64-NEXT: cmovneq %rdx, %rax
; X64-NEXT: retq
%t0 = sub i64 0, %a
----------------
andreadb wrote:
> Again. Here we may prefer POPCNT to BLSI. It tends to have better latency/throughput overall. I think it is worthy to raise a bug for this.
>
> Speaking about these tests in general:
> I think that we should make these more robust (maybe in a separate patch).
>
> We can probably make this test more robust by changing how we check the result. For example, rather than comparing against zero, we can compare against a specific power-of-2. That would force the selection of BLSI, since we would need to know the position of that bit.
>
> We can probably do something similar to improve the other test.
I thought we just established that BLSI could be replaced with a compare of the input with 0. Why would we replace it with POPCNT?
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D55870/new/
https://reviews.llvm.org/D55870
More information about the llvm-commits
mailing list