[llvm-bugs] [Bug 37266] New: Performance regression 4.0 to 6.0 due to unrolling the first trip through an SSE2 ASCII validation loop

via llvm-bugs llvm-bugs at lists.llvm.org
Fri Apr 27 01:54:44 PDT 2018


https://bugs.llvm.org/show_bug.cgi?id=37266

            Bug ID: 37266
           Summary: Performance regression 4.0 to 6.0 due to unrolling the
                    first trip through an SSE2 ASCII validation loop
           Product: new-bugs
           Version: 6.0
          Hardware: PC
                OS: All
            Status: NEW
          Severity: normal
          Priority: P
         Component: new bugs
          Assignee: unassignedbugs at nondot.org
          Reporter: hsivonen at hsivonen.fi
                CC: llvm-bugs at lists.llvm.org

Minimized test case: https://github.com/hsivonen/llvm_ascii_validation (assumes
that https://rustup.rs/ is installed.)

When rustc switched from LLVM 4.0 to LLVM 6.0, Firefox's Rust-based SSE2-using
UTF-8 validation function regressed up to 12.5% in performance (in CI on EC2).

What changed was that LLVM 6.0 unrolls the first trip through the innermost
SSE2 ASCII validation loop, but LLVM 4.0 didn't unroll it and compiled it to
the most obvious form.

The basic block of the loop as produced by LLVM 4.0:

.LBB0_6:
        movdqu  (%rdi,%rax), %xmm0
        pmovmskb        %xmm0, %edx
        testl   %edx, %edx
        jne     .LBB0_7
        addq    $16, %rax
        cmpq    %rcx, %rax
        jbe     .LBB0_6
        jmp     .LBB0_2

The unrolled part and the actual loop as produced by LLVM 6.0:

        .cfi_startproc
        cmpq    $16, %rsi
        jb      .LBB0_1
        movdqu  (%rdi), %xmm0
        pmovmskb        %xmm0, %ecx
        testl   %ecx, %ecx
        je      .LBB0_10
        xorl    %esi, %esi
        testl   %ecx, %ecx
        je      .LBB0_7
.LBB0_8:
        bsfl    %ecx, %eax
        jmp     .LBB0_9
.LBB0_1:
        xorl    %eax, %eax
        cmpq    %rsi, %rax
        jb      .LBB0_13
.LBB0_15:
        movq    %rsi, %rax
        retq
.LBB0_10:
        leaq    -16(%rsi), %rdx
        movl    $16, %eax
        .p2align        4, 0x90
.LBB0_11:
        cmpq    %rdx, %rax
        ja      .LBB0_12
        movdqu  (%rdi,%rax), %xmm0
        pmovmskb        %xmm0, %ecx
        addq    $16, %rax
        testl   %ecx, %ecx
        je      .LBB0_11
        addq    $-16, %rax
        movq    %rax, %rsi
        testl   %ecx, %ecx
        jne     .LBB0_8

The unrolling is visible already on the LLVM-IR level. Both cases used
opt-level 2, which is what Firefox ships with.

# Benchmarking the minimized test case looks like this to me:

x86_64 code running on Intel(R) Xeon(R) CPU E5-2637 v4 @ 3.50GHz (Broadwell-EP)

Similar results with both the powersave and performance governors.

$ rustup default 1.24.0
$ ./bench.sh
[...]
test bench ... bench:   1,539,341 ns/iter (+/- 216,985)

$ rustup default 1.25.0
$ ./bench.sh
[...]
test bench ... bench:   1,865,801 ns/iter (+/- 22,297)

x86_64 code running on Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz (Haswell-DT)

$ rustup default 1.24.0
$ ./bench.sh
[...]
test bench ... bench:   1,491,560 ns/iter (+/- 65,163)

$ rustup default 1.25.0
$ ./bench.sh
[...]
test bench ... bench:   1,673,239 ns/iter (+/- 15,355)

# Links to other bug databases

Rust: https://github.com/rust-lang/rust/issues/49873
Firefox: https://bugzilla.mozilla.org/show_bug.cgi?id=1451703

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20180427/ce2b39f3/attachment-0001.html>


More information about the llvm-bugs mailing list