<html>
<head>
<base href="http://llvm.org/bugs/" />
</head>
<body><span class="vcard"><a class="email" href="mailto:steven@uplinklabs.net" title="Steven Noonan <steven@uplinklabs.net>"> <span class="fn">Steven Noonan</span></a>
</span> changed
<a class="bz_bug_link
bz_status_REOPENED "
title="REOPENED --- - optimize reciprocals with fast-math (x86)"
href="http://llvm.org/bugs/show_bug.cgi?id=21385">bug 21385</a>
<br>
<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>What</th>
<th>Removed</th>
<th>Added</th>
</tr>
<tr>
<td style="text-align:right;">Status</td>
<td>RESOLVED
</td>
<td>REOPENED
</td>
</tr>
<tr>
<td style="text-align:right;">CC</td>
<td>
</td>
<td>steven@uplinklabs.net
</td>
</tr>
<tr>
<td style="text-align:right;">Resolution</td>
<td>FIXED
</td>
<td>---
</td>
</tr></table>
<p>
<div>
<b><a class="bz_bug_link
bz_status_REOPENED "
title="REOPENED --- - optimize reciprocals with fast-math (x86)"
href="http://llvm.org/bugs/show_bug.cgi?id=21385#c23">Comment # 23</a>
on <a class="bz_bug_link
bz_status_REOPENED "
title="REOPENED --- - optimize reciprocals with fast-math (x86)"
href="http://llvm.org/bugs/show_bug.cgi?id=21385">bug 21385</a>
from <span class="vcard"><a class="email" href="mailto:steven@uplinklabs.net" title="Steven Noonan <steven@uplinklabs.net>"> <span class="fn">Steven Noonan</span></a>
</span></b>
<pre>I'd like to reopen this issue to ask why FeatureUseSqrtEst and
FeatureUseRecipEst were only enabled for Jaguar. It has demonstrable benefits
on many if not all of the other x86 microarchitectures. Is there a reason it
cannot be enabled more broadly?
I'm not sure what GCC's criteria is for enabling it, but I've seen reciprocal
square root estimate enabled on every x86 -march= I know of whenever
-ffast-math is specified and SSE is available. For example, even as far back as
-march=pentium3 rsqrtss is used:
float rsqrtf(float f)
{
return 1.0f / sqrtf(f);
}
$ gcc -m32 -O3 -ffast-math -mfpmath=sse -march=pentium3 -S -o - rsqrt.c
[...]
rsqrtf:
subl $4, %esp
rsqrtss 8(%esp), %xmm1
movss 8(%esp), %xmm0
mulss %xmm1, %xmm0
mulss %xmm1, %xmm0
mulss .LC1, %xmm1
addss .LC0, %xmm0
mulss %xmm1, %xmm0
movss %xmm0, (%esp)
flds (%esp)
popl %eax
ret
.LC0:
.long 3225419776
.LC1:
.long 3204448256
GCC does not, however, emit reciprocal estimates, even with -march=haswell.
It's possible that GCC does not implement any selection of RCPSS:
float recipf(float f)
{
return 1.0f / f;
}
$ gcc -O3 -ffast-math -mfpmath=sse -march=haswell -S -o - recip.c
[...]
vmovss .LC0(%rip), %xmm1
vdivss %xmm0, %xmm1, %xmm0
ret
.LC0:
.long 1065353216
At the very least I'd like to see reciprocal square root estimates added by
default on x86 for -ffast-math. How can we get such a change implemented? Is it
a matter of building confidence in the safety and benefit of such a change?</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>