[PATCH] D14496: X86: More efficient codegen for 64-bit compare-and-branch
Hans Wennborg via llvm-commits
llvm-commits at lists.llvm.org
Mon Nov 9 04:11:54 PST 2015
hans created this revision.
hans added reviewers: mkuper, majnemer.
hans added subscribers: llvm-commits, hansw.
This patch changes the lowering of 64-bit compare-and-branch on 32-bit x86 to compare and branch on the high and low bits separately, instead of doing so in one step.
>From what I understand, the current lowering is a result of LLVM doing legalization per basic block. My patch works around this by pattern-matching the legalization to a pseudo instruction with custom inserter.
Example:
```
define i32 @test_slt(i64 %a, i64 %b) {
entry:
%cmp = icmp slt i64 %a, %b
br i1 %cmp, label %bb1, label %bb2
bb1:
ret i32 1
bb2:
ret i32 2
}
define i32 @test_eq(i64 %a, i64 %b) {
entry:
%cmp = icmp eq i64 %a, %b
br i1 %cmp, label %bb1, label %bb2
bb1:
ret i32 1
bb2:
ret i32 2
}
```
Before this patch:
```
test_slt:
movl 4(%esp), %eax
movl 8(%esp), %ecx
cmpl 12(%esp), %eax
setae %al
cmpl 16(%esp), %ecx
setge %cl
je .LBB2_2
movb %cl, %al
.LBB2_2:
testb %al, %al
jne .LBB2_4
movl $1, %eax
retl
.LBB2_4:
movl $2, %eax
retl
test_eq:
movl 4(%esp), %eax
movl 8(%esp), %ecx
xorl 16(%esp), %ecx
xorl 12(%esp), %eax
orl %ecx, %eax
jne .LBB0_2
movl $1, %eax
retl
.LBB0_2:
movl $2, %eax
retl
```
After this patch:
```
test_slt:
movl 16(%esp), %eax
cmpl %eax, 8(%esp)
jl .LBB2_2
jg .LBB2_3
movl 12(%esp), %eax
cmpl %eax, 4(%esp)
jae .LBB2_3
.LBB2_2:
movl $1, %eax
retl
.LBB2_3:
movl $2, %eax
retl
test_eq:
movl 12(%esp), %eax
cmpl %eax, 4(%esp)
jne .LBB0_3
movl 16(%esp), %eax
cmpl %eax, 8(%esp)
jne .LBB0_3
movl $1, %eax
retl
.LBB0_3:
movl $2, %eax
retl
```
I think the new code looks more straight-forward. It also reduces register pressure in both cases, and for non-equality comparisons, the new code is 6 bytes smaller. On a 32-bit Clang bootstrap, this results in 15 KB binary size savings.
In terms of performance, I measured ~20% speed-up for the non-equality comparison and 7% for equality. Even when setting the high bits to zero, forcing the low bits to always be considered, the new version was still faster (presumably because the first branch was well predicted).
http://reviews.llvm.org/D14496
Files:
include/llvm/CodeGen/MachineBasicBlock.h
lib/Target/X86/X86ISelLowering.cpp
lib/Target/X86/X86ISelLowering.h
lib/Target/X86/X86InstrCompiler.td
lib/Target/X86/X86InstrInfo.td
test/CodeGen/X86/avx512-cmp.ll
test/CodeGen/X86/wide-integer-cmp.ll
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D14496.39675.patch
Type: text/x-patch
Size: 17781 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20151109/c27e9791/attachment.bin>
More information about the llvm-commits
mailing list