[PATCH] D14496: X86: More efficient codegen for 64-bit compare-and-branch

Mon Nov 9 04:11:54 PST 2015

hans created this revision.
hans added reviewers: mkuper, majnemer.
hans added subscribers: llvm-commits, hansw.

This patch changes the lowering of 64-bit compare-and-branch on 32-bit x86 to compare and branch on the high and low bits separately, instead of doing so in one step.

>From what I understand, the current lowering is a result of LLVM doing legalization per basic block. My patch works around this by pattern-matching the legalization to a pseudo instruction with custom inserter.

Example:

```
define i32 @test_slt(i64 %a, i64 %b) {
entry:
  %cmp = icmp slt i64 %a, %b
  br i1 %cmp, label %bb1, label %bb2
bb1:
  ret i32 1
bb2:
  ret i32 2
}

define i32 @test_eq(i64 %a, i64 %b) {
entry:
  %cmp = icmp eq i64 %a, %b
  br i1 %cmp, label %bb1, label %bb2
bb1:
  ret i32 1
bb2:
  ret i32 2
}
```

Before this patch:

```
test_slt:
        movl    4(%esp), %eax
        movl    8(%esp), %ecx
        cmpl    12(%esp), %eax
        setae   %al
        cmpl    16(%esp), %ecx
        setge   %cl
        je      .LBB2_2
        movb    %cl, %al
.LBB2_2:
        testb   %al, %al
        jne     .LBB2_4
        movl    $1, %eax
        retl
.LBB2_4:
        movl    $2, %eax
        retl

test_eq:
        movl    4(%esp), %eax
        movl    8(%esp), %ecx
        xorl    16(%esp), %ecx
        xorl    12(%esp), %eax
        orl     %ecx, %eax
        jne     .LBB0_2
        movl    $1, %eax
        retl
.LBB0_2:
        movl    $2, %eax
        retl
```

After this patch:

```
test_slt:
        movl    16(%esp), %eax
        cmpl    %eax, 8(%esp)
        jl      .LBB2_2
        jg      .LBB2_3
        movl    12(%esp), %eax
        cmpl    %eax, 4(%esp)
        jae     .LBB2_3
.LBB2_2:
        movl    $1, %eax
        retl
.LBB2_3:
        movl    $2, %eax
        retl

test_eq:
        movl    12(%esp), %eax
        cmpl    %eax, 4(%esp)
        jne     .LBB0_3
        movl    16(%esp), %eax
        cmpl    %eax, 8(%esp)
        jne     .LBB0_3
        movl    $1, %eax
        retl
.LBB0_3:
        movl    $2, %eax
        retl
```

I think the new code looks more straight-forward. It also reduces register pressure in both cases, and for non-equality comparisons, the new code is 6 bytes smaller. On a 32-bit Clang bootstrap, this results in 15 KB binary size savings.

In terms of performance, I measured ~20% speed-up for the non-equality comparison and 7% for equality. Even when setting the high bits to zero, forcing the low bits to always be considered, the new version was still faster (presumably because the first branch was well predicted).

http://reviews.llvm.org/D14496

Files:
  include/llvm/CodeGen/MachineBasicBlock.h
  lib/Target/X86/X86ISelLowering.cpp
  lib/Target/X86/X86ISelLowering.h
  lib/Target/X86/X86InstrCompiler.td
  lib/Target/X86/X86InstrInfo.td
  test/CodeGen/X86/avx512-cmp.ll
  test/CodeGen/X86/wide-integer-cmp.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D14496.39675.patch
Type: text/x-patch
Size: 17781 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20151109/c27e9791/attachment.bin>