<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - [x86] codegen for fcmp oeq is inconsistent"
href="https://bugs.llvm.org/show_bug.cgi?id=34563">34563</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>[x86] codegen for fcmp oeq is inconsistent
</td>
</tr>
<tr>
<th>Product</th>
<td>libraries
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>All
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>enhancement
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>Backend: X86
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>spatel+llvm@rotateright.com
</td>
</tr>
<tr>
<th>CC</th>
<td>llvm-bugs@lists.llvm.org
</td>
</tr></table>
<p>
<div>
<pre>bool fcmp_oeq(double f1, double f2) {
return f1 == f2;
}
bool fcmp_oeq_twice(double f1, double f2, double f3, double f4) {
return f1 == f2 && f3 == f4;
}
Or as IR:
define i1 @fcmp_oeq(double %f1, double %f2) {
%cmp = fcmp oeq double %f1, %f2
ret i1 %cmp
}
define i1 @fcmp_oeq_twice(double %f1, double %f2, double %f3, double %f4) {
%cmp1 = fcmp oeq double %f1, %f2
%cmp2 = fcmp oeq double %f3, %f4
%and = and i1 %cmp1, %cmp2
ret i1 %and
}
----------------------------------------------------------------------------
$ ./llc -o - -mtriple=x86_64-unknown-unknown fcmps.ll
fcmp_oeq(double, double): # @fcmp_oeq(double, double)
cmpeqsd %xmm1, %xmm0
movq %xmm0, %rax
andl $1, %eax
retq
fcmp_oeq_twice(double, double, double, double): #
@fcmp_oeq_twice(double, double, double, double)
ucomisd %xmm1, %xmm0
setnp %al
sete %cl
andb %al, %cl
ucomisd %xmm3, %xmm2
setnp %dl
sete %al
andb %dl, %al
andb %cl, %al
retq
-----------------------------------------------------------------------------
x86 doesn't have a 'setcc' for oeq (?!), so if we're using 'ucomisd', we have
to do an and-of-setcc to generate that predicate. If we use 'cmpeqsd' as in the
first example, we incur a vector-to-scalar register move. That might not be as
fast?
The inconsistency here should be investigated. But it's also possible that
we're doing the wrong thing for both cases. In the 2nd example if we use
'cmpeqsd', then we could reduce the instruction count with something like:
cmpeqsd %xmm1, %xmm0
cmpeqsd %xmm3, %xmm2
andps %xmm0, %xmm2
movd %xmm0, %eax
andl $1, %eax</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>