[llvm-bugs] [Bug 37266] New: Performance regression 4.0 to 6.0 due to unrolling the first trip through an SSE2 ASCII validation loop
via llvm-bugs
llvm-bugs at lists.llvm.org
Fri Apr 27 01:54:44 PDT 2018
https://bugs.llvm.org/show_bug.cgi?id=37266
Bug ID: 37266
Summary: Performance regression 4.0 to 6.0 due to unrolling the
first trip through an SSE2 ASCII validation loop
Product: new-bugs
Version: 6.0
Hardware: PC
OS: All
Status: NEW
Severity: normal
Priority: P
Component: new bugs
Assignee: unassignedbugs at nondot.org
Reporter: hsivonen at hsivonen.fi
CC: llvm-bugs at lists.llvm.org
Minimized test case: https://github.com/hsivonen/llvm_ascii_validation (assumes
that https://rustup.rs/ is installed.)
When rustc switched from LLVM 4.0 to LLVM 6.0, Firefox's Rust-based SSE2-using
UTF-8 validation function regressed up to 12.5% in performance (in CI on EC2).
What changed was that LLVM 6.0 unrolls the first trip through the innermost
SSE2 ASCII validation loop, but LLVM 4.0 didn't unroll it and compiled it to
the most obvious form.
The basic block of the loop as produced by LLVM 4.0:
.LBB0_6:
movdqu (%rdi,%rax), %xmm0
pmovmskb %xmm0, %edx
testl %edx, %edx
jne .LBB0_7
addq $16, %rax
cmpq %rcx, %rax
jbe .LBB0_6
jmp .LBB0_2
The unrolled part and the actual loop as produced by LLVM 6.0:
.cfi_startproc
cmpq $16, %rsi
jb .LBB0_1
movdqu (%rdi), %xmm0
pmovmskb %xmm0, %ecx
testl %ecx, %ecx
je .LBB0_10
xorl %esi, %esi
testl %ecx, %ecx
je .LBB0_7
.LBB0_8:
bsfl %ecx, %eax
jmp .LBB0_9
.LBB0_1:
xorl %eax, %eax
cmpq %rsi, %rax
jb .LBB0_13
.LBB0_15:
movq %rsi, %rax
retq
.LBB0_10:
leaq -16(%rsi), %rdx
movl $16, %eax
.p2align 4, 0x90
.LBB0_11:
cmpq %rdx, %rax
ja .LBB0_12
movdqu (%rdi,%rax), %xmm0
pmovmskb %xmm0, %ecx
addq $16, %rax
testl %ecx, %ecx
je .LBB0_11
addq $-16, %rax
movq %rax, %rsi
testl %ecx, %ecx
jne .LBB0_8
The unrolling is visible already on the LLVM-IR level. Both cases used
opt-level 2, which is what Firefox ships with.
# Benchmarking the minimized test case looks like this to me:
x86_64 code running on Intel(R) Xeon(R) CPU E5-2637 v4 @ 3.50GHz (Broadwell-EP)
Similar results with both the powersave and performance governors.
$ rustup default 1.24.0
$ ./bench.sh
[...]
test bench ... bench: 1,539,341 ns/iter (+/- 216,985)
$ rustup default 1.25.0
$ ./bench.sh
[...]
test bench ... bench: 1,865,801 ns/iter (+/- 22,297)
x86_64 code running on Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz (Haswell-DT)
$ rustup default 1.24.0
$ ./bench.sh
[...]
test bench ... bench: 1,491,560 ns/iter (+/- 65,163)
$ rustup default 1.25.0
$ ./bench.sh
[...]
test bench ... bench: 1,673,239 ns/iter (+/- 15,355)
# Links to other bug databases
Rust: https://github.com/rust-lang/rust/issues/49873
Firefox: https://bugzilla.mozilla.org/show_bug.cgi?id=1451703
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20180427/ce2b39f3/attachment-0001.html>
More information about the llvm-bugs
mailing list