<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - [x86] Failure to use both div and mod results of one IDIV in a prime-factor loop while(n%i==0) { n/=i; }"
href="https://bugs.llvm.org/show_bug.cgi?id=37101">37101</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>[x86] Failure to use both div and mod results of one IDIV in a prime-factor loop while(n%i==0) { n/=i; }
</td>
</tr>
<tr>
<th>Product</th>
<td>new-bugs
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>Linux
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Keywords</th>
<td>performance
</td>
</tr>
<tr>
<th>Severity</th>
<td>enhancement
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>new bugs
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>peter@cordes.ca
</td>
</tr>
<tr>
<th>CC</th>
<td>llvm-bugs@lists.llvm.org
</td>
</tr></table>
<p>
<div>
<pre>Near-identical code-gen to <a href="https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85366">https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85366</a>
for this loop. From
<a href="https://codereview.stackexchange.com/questions/191792/find-prime-factors-in-c/191801#191801">https://codereview.stackexchange.com/questions/191792/find-prime-factors-in-c/191801#191801</a>,
simplified to use a pointer instead of returning std::vector<int>:
void find_prime_factors_ptr(int n, int *p)
{
// inefficient to test even numbers > 2, but that's a separate missed
optimization.
for (int i = 2; i <= n; i++) {
while (n % i == 0) {
*p++ = i;
n /= i; // reordering the loop body doesn't help
}
}
}
<a href="https://godbolt.org/g/GLycMM">https://godbolt.org/g/GLycMM</a>
clang version 7.0.0 (trunk 329780)
find_prime_factors_ptr(int, int*):
movl %edi, %ecx
cmpl $2, %edi
jl .LBB0_6
movl $2, %edi # why not keep n in EDI, and use ECX for
i?
.LBB0_2: # =>This Loop Header: Depth=1
movl %ecx, %eax
cltd
idivl %edi
testl %edx, %edx
jne .LBB0_5
.LBB0_3: # Parent Loop BB0_2 Depth=1
movl %edi, (%rsi)
movl %ecx, %eax
cltd
idivl %edi
movl %eax, %ecx
cltd
idivl %edi
addq $4, %rsi
testl %edx, %edx
je .LBB0_3
.LBB0_5: # in Loop: Header=BB0_2 Depth=1
leal 1(%rdi), %eax # this compare-latency optimization
cmpl %ecx, %edi # doesn't seem worth it
movl %eax, %edi # 4 uops instead of 2 from defeating
macro-fusion
jl .LBB0_2 # vs. inc %rdi / cmp %ecx,%edi / jle
.LBB0_6:
retq
Using idiv twice inside the loop body is totally pointless. At either way of
reaching .LBB0_3 (fall through or loop), n/i is in EAX, but LLVM ignores it.
The inner loop can be optimized by dropping movl %ecx, %eax / cltd / idiv with
no other changes.
.LBB0_3:
movl %edi, (%rsi)
addq $4, %rsi
movl %eax, %ecx # n = tmp
cltd
idivl %edi # eax = tmp = n/i
testl %edx, %edx
je .LBB0_3
Unlike for gcc, changing the inner loop like this didn't help:
int tmp;
while (tmp = n/i, n % i == 0) {
*p++ = i;
n = tmp;
}
(see the linked gcc bug report for an interesting version that jumps into the
inner loop instead of hoisting the run-zero-times check out of the loop before
the first iteration.)
I think one of the versions on the godbolt link did manage to optimize better
with clang, but I haven't investigated further. (I think it was one of the
versions using `vector.push_back` inside the loop.)</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>