[llvm-bugs] [Bug 46773] New: Performance regression for sqrt, sqrtf, sqrtl in for loop
via llvm-bugs
llvm-bugs at lists.llvm.org
Sat Jul 18 09:41:59 PDT 2020
https://bugs.llvm.org/show_bug.cgi?id=46773
Bug ID: 46773
Summary: Performance regression for sqrt, sqrtf, sqrtl in for
loop
Product: new-bugs
Version: 10.0
Hardware: PC
OS: Linux
Status: NEW
Severity: enhancement
Priority: P
Component: new bugs
Assignee: unassignedbugs at nondot.org
Reporter: joshstockin at gmail.com
CC: htmldeveloper at gmail.com, llvm-bugs at lists.llvm.org
clang version 10.0.0-4ubuntu1
Target: x86_64-pc-linux-gnu
Thread model: posix
This might actually be a twofer.
Consider the following function, which calculates the number of primes up to
10,000,000:
void primes() {
uint32_t NUM = 10000000;
uint32_t total = 1;
for (uint32_t N = 3; N < NUM; N += 2) {
uint32_t prime = 1;
for (uint32_t f = 3; f <= (uint32_t)sqrtf(N); f += 2) {
if (N % f == 0) {
prime = 0;
break;
}
}
total += prime;
}
printf("Total primes: %d\n", total);
}
Note the for loop with sqrt (this also applies to sqrtf and sqrtl):
for (uint32_t f = 3; f <= (uint32_t)sqrt(N); f += 2)
First optimization bug: LLVM doesn't seem to recognize sqrt* as pure functions,
and so doesn't pull them out of the condition--instead, the sqrt operation is
called repeatedly for every comparison, tremendously slowing down the program.
(This was compiled with -O3.)
; This source compiled including the sqrt() function
; There are similar results for other sqrt functions
.L19:
mov eax, ebx
xor edx, edx
div r12d
test edx, edx ; N % f == 0
je .L4 ; true, jump
movsd xmm1, QWORD PTR [rsp+8] ; false, continue loop
add r12d, 2 ; f += 2
.L5:
movapd xmm0, xmm1 ; Why do this block again?
movsd QWORD PTR [rsp+8], xmm1 ; N hasn't changed, and sqrt*
call sqrt ; are pure functions.
pxor xmm2, xmm2 ; This makes the executable
cvttsd2si rax, xmm0 ; ~2x slower than gcc -O3's.
cmp r12d, eax
jbe .L19
You may have noticed the second optimization bug: the sqrt* call here isn't
optimized down to the processor's sqrt instruction. (These are sqrtsd for
sqrt() and sqrtss for sqrtf(). It doesn't look like there's an instruction for
sqrtl(), but if there is, LLVM should optimize to it.) Also note that there are
extraneous sqrt* calls/instructions in the full compiled output, for whatever
reasons LLVM's optimization engine has, but they're inconsistent. In the same
compiled output there are mixed function calls and instructions for the same
operation.
Some speed benchmarks for emphasis:
$ time ./primes-clangO3 => 0m5.298s real
Total primes: 664579
$ time ./primes-clangO0 => 0m5.701s "
Total primes: 644579
$ time ./primes-gccO3 => 0m2.872s "
Total primes: 664579
$ time ./primes-gccO0 => 0m5.703s "
Total primes: 664579
I hope there's an easy fix here.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20200718/69a53277/attachment.html>
More information about the llvm-bugs
mailing list