<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - Performance regression for sqrt, sqrtf, sqrtl in for loop"
href="https://bugs.llvm.org/show_bug.cgi?id=46773">46773</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>Performance regression for sqrt, sqrtf, sqrtl in for loop
</td>
</tr>
<tr>
<th>Product</th>
<td>new-bugs
</td>
</tr>
<tr>
<th>Version</th>
<td>10.0
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>Linux
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>enhancement
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>new bugs
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>joshstockin@gmail.com
</td>
</tr>
<tr>
<th>CC</th>
<td>htmldeveloper@gmail.com, llvm-bugs@lists.llvm.org
</td>
</tr></table>
<p>
<div>
<pre>clang version 10.0.0-4ubuntu1
Target: x86_64-pc-linux-gnu
Thread model: posix
This might actually be a twofer.
Consider the following function, which calculates the number of primes up to
10,000,000:
void primes() {
uint32_t NUM = 10000000;
uint32_t total = 1;
for (uint32_t N = 3; N < NUM; N += 2) {
uint32_t prime = 1;
for (uint32_t f = 3; f <= (uint32_t)sqrtf(N); f += 2) {
if (N % f == 0) {
prime = 0;
break;
}
}
total += prime;
}
printf("Total primes: %d\n", total);
}
Note the for loop with sqrt (this also applies to sqrtf and sqrtl):
for (uint32_t f = 3; f <= (uint32_t)sqrt(N); f += 2)
First optimization bug: LLVM doesn't seem to recognize sqrt* as pure functions,
and so doesn't pull them out of the condition--instead, the sqrt operation is
called repeatedly for every comparison, tremendously slowing down the program.
(This was compiled with -O3.)
; This source compiled including the sqrt() function
; There are similar results for other sqrt functions
.L19:
mov eax, ebx
xor edx, edx
div r12d
test edx, edx ; N % f == 0
je .L4 ; true, jump
movsd xmm1, QWORD PTR [rsp+8] ; false, continue loop
add r12d, 2 ; f += 2
.L5:
movapd xmm0, xmm1 ; Why do this block again?
movsd QWORD PTR [rsp+8], xmm1 ; N hasn't changed, and sqrt*
call sqrt ; are pure functions.
pxor xmm2, xmm2 ; This makes the executable
cvttsd2si rax, xmm0 ; ~2x slower than gcc -O3's.
cmp r12d, eax
jbe .L19
You may have noticed the second optimization bug: the sqrt* call here isn't
optimized down to the processor's sqrt instruction. (These are sqrtsd for
sqrt() and sqrtss for sqrtf(). It doesn't look like there's an instruction for
sqrtl(), but if there is, LLVM should optimize to it.) Also note that there are
extraneous sqrt* calls/instructions in the full compiled output, for whatever
reasons LLVM's optimization engine has, but they're inconsistent. In the same
compiled output there are mixed function calls and instructions for the same
operation.
Some speed benchmarks for emphasis:
$ time ./primes-clangO3 => 0m5.298s real
Total primes: 664579
$ time ./primes-clangO0 => 0m5.701s "
Total primes: 644579
$ time ./primes-gccO3 => 0m2.872s "
Total primes: 664579
$ time ./primes-gccO0 => 0m5.703s "
Total primes: 664579
I hope there's an easy fix here.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>