<html>
<head>
<base href="http://llvm.org/bugs/" />
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW --- - use fsel to avoid branch and compare"
href="http://llvm.org/bugs/show_bug.cgi?id=21231">21231</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>use fsel to avoid branch and compare
</td>
</tr>
<tr>
<th>Product</th>
<td>libraries
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>All
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>normal
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>Backend: PowerPC
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>spatel+llvm@rotateright.com
</td>
</tr>
<tr>
<th>CC</th>
<td>llvmbugs@cs.uiuc.edu
</td>
</tr>
<tr>
<th>Classification</th>
<td>Unclassified
</td>
</tr></table>
<p>
<div>
<pre>This testcase is derived from llvm/test/CodeGen/PowerPC/recipest.ll.
Using -ffast-math, we can convert a sqrt intrinsic into a reciprocal square
root multiplied by its argument ( X * X ** -0.5 = X ** 0.5 )...with one
problem: we can't let a '0.0f' input turn into a 'NaN' output.
The current PPC scalar codegen compares and branches around that:
$ cat sqrtf.ll
declare float @llvm.sqrt.f32(float)
define float @goo3(float %a) nounwind {
%r = call float @llvm.sqrt.f32(float %a)
ret float %r
}
$ ./llc -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 -enable-unsafe-fp-math
sqrtf.ll -o -
...
.L.goo3:
# BB#0:
addis 3, 2, .LCPI0_1@toc@ha
lfs 0, .LCPI0_1@toc@l(3)
fcmpu 0, 1, 0
beq 0, .LBB0_2
# BB#1:
frsqrtes 0, 1
addis 3, 2, .LCPI0_0@toc@ha
lfs 2, .LCPI0_0@toc@l(3)
fnmsubs 3, 1, 2, 1
fmuls 4, 0, 0
fmadds 2, 3, 4, 2
fmuls 0, 0, 2
fmuls 0, 1, 0
.LBB0_2:
fmr 1, 0
blr
------------------------------------------------------
An 'fsel' would probably be a better choice here for performance. For the
vector PPC case, we do generate vcmpeqfp/vandc.
X86 scalar code (when enabled) will use a similar pattern to do the select:
vrsqrtss %xmm0, %xmm0, %xmm1
vmulss LCPI0_0(%rip), %xmm1, %xmm2
vmulss %xmm1, %xmm1, %xmm1
vmulss %xmm0, %xmm1, %xmm1
vaddss LCPI0_1(%rip), %xmm1, %xmm1
vmulss %xmm2, %xmm1, %xmm1
vxorps %xmm2, %xmm2, %xmm2
vmulss %xmm1, %xmm0, %xmm1
vcmpeqss %xmm2, %xmm0, %xmm0
vandnps %xmm1, %xmm0, %xmm0</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>