<html>
<head>
<base href="https://llvm.org/bugs/" />
</head>
<body><span class="vcard"><a class="email" href="mailto:ahmed.bougacha@gmail.com" title="Ahmed Bougacha <ahmed.bougacha@gmail.com>"> <span class="fn">Ahmed Bougacha</span></a>
</span> changed
<a class="bz_bug_link
bz_status_REOPENED "
title="REOPENED --- - clang c compiler produces wrong result for the attached c code with -O2 optimzation"
href="https://llvm.org/bugs/show_bug.cgi?id=26110">bug 26110</a>
<br>
<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>What</th>
<th>Removed</th>
<th>Added</th>
</tr>
<tr>
<td style="text-align:right;">Status</td>
<td>RESOLVED
</td>
<td>REOPENED
</td>
</tr>
<tr>
<td style="text-align:right;">CC</td>
<td>
</td>
<td>ahmed.bougacha@gmail.com
</td>
</tr>
<tr>
<td style="text-align:right;">Component</td>
<td>LLVM Codegen
</td>
<td>Backend: X86
</td>
</tr>
<tr>
<td style="text-align:right;">Version</td>
<td>3.7
</td>
<td>trunk
</td>
</tr>
<tr>
<td style="text-align:right;">Resolution</td>
<td>FIXED
</td>
<td>---
</td>
</tr>
<tr>
<td style="text-align:right;">Assignee</td>
<td>unassignedclangbugs@nondot.org
</td>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<td style="text-align:right;">Product</td>
<td>clang
</td>
<td>libraries
</td>
</tr></table>
<p>
<div>
<b><a class="bz_bug_link
bz_status_REOPENED "
title="REOPENED --- - clang c compiler produces wrong result for the attached c code with -O2 optimzation"
href="https://llvm.org/bugs/show_bug.cgi?id=26110#c3">Comment # 3</a>
on <a class="bz_bug_link
bz_status_REOPENED "
title="REOPENED --- - clang c compiler produces wrong result for the attached c code with -O2 optimzation"
href="https://llvm.org/bugs/show_bug.cgi?id=26110">bug 26110</a>
from <span class="vcard"><a class="email" href="mailto:ahmed.bougacha@gmail.com" title="Ahmed Bougacha <ahmed.bougacha@gmail.com>"> <span class="fn">Ahmed Bougacha</span></a>
</span></b>
<pre>So; I looked a little closer. Sanjay's bisect was correct. clang-700 is pretty
old now; I bisected to:
r229099 [SimplifyCFG] Be more aggressive
Sure enough, this still reproduces on trunk with -mllvm
-phi-node-folding-threshold=1.
Long story short: the problematic pattern is:
(c ? -v : v)
which we lower to (because "c" is <4 x i1>, lowered as a vector mask):
(~c & v) | (c & -v)
roughly corresponding to this IR:
define <4 x i32> @t(<4 x i32> %v, <4 x i32> %c) {
%cl = shl <4 x i32> %c, <i32 31, i32 31, i32 31, i32 31>
%cs = ashr <4 x i32> %c, <i32 31, i32 31, i32 31, i32 31>
%tmp2 = trunc <4 x i32> %cs to <4 x i1>
; ^ not as artificial as it looks: equivalent to a legalized vsetcc
%mv = sub nsw <4 x i32> zeroinitializer, %v
%r = select <4 x i1> %tmp2, <4 x i32> %v, <4 x i32> %mv
ret <4 x i32> %r
}
The SSE2 codegen is pretty straightforward:
xorps %xmm1, %xmm1
... # xmm6 <- %v
... # xmm3 <- %c
psubd %xmm6, %xmm1 # 0 - v # 0 - 5 -> -5
movaps %xmm3, %xmm0 # c # 0 -> 0
pandn %xmm6, %xmm0 # ~c & v # ~0 & 5 -> 5
pand %xmm3, %xmm1 # c & -v # -5 & 0 -> 0
por %xmm0, %xmm1 # (~c & v) | (c & -v) # 0 | 5 -> 5
However when we have SSSE3 (the default on OS X), we try to match it to PSIGND,
instead doing:
psignd %xmm3, %xmm1 # (c < 0 ? -v : (c > 0 ? v : 0))
# c is a mask, so (c > 0) == 0
# (c ? -v : 0)
# (0 ? -5 : 0)
# -> 0
Which is not equivalent; one does:
(c ? -v : 0)
the other:
(c ? -v : v)
Now. This bug existed since 2010. However, I think we don't know about this
issue because of operand canonicalization.
The PSIGN combine matches:
(or (and m, x), (pandn m, (0 - x)))
(or (and x, m), (pandn m, (0 - x)))
(or (pandn m, (0 - x)), (and m, x))
(or (pandn m, (0 - x)), (and x, m))
but not the variants of:
(or (and m, (0 - x)), (pandn m, x))
Which is what gets generated for the function above (the most obvious IR that I
could write).
I think this is pretty easy to fix: instead of using c as a mask, put any
non-sign bit in there, to default to the 'v' case.
So, this should work:
por <1,1,1,1>, %xmm3 # c' = c | 1
psignd %xmm3, %xmm1 # (c' < 0 ? -v : (c' > 0 ? v : 0))
# c is a mask, so c' is either 1 or 0xff..f
# (c' == 0xff..f ? -v : (c' != 0 ? v : v))
# (c' == 0xff..f ? -v : v)
# (0 ? -5 : 5)
# -> 5
CP loads are cheap, so this is probably still a win over the SSE2 codegen:
psrad $31, %xmm1
pxor %xmm2, %xmm2
psubd %xmm0, %xmm2
pand %xmm1, %xmm2
pandn %xmm0, %xmm1
por %xmm1, %xmm2
movdqa %xmm2, %xmm0
Note that I don't think the couple of PSIGN tests in trunk are correct either.
Consider test/CodeGen/X86/vec-sign.ll:
define <4 x i32> @signd(<4 x i32> %a, <4 x i32> %b) nounwind {
entry:
%b.lobit = ashr <4 x i32> %b, <i32 31, i32 31, i32 31, i32 31>
%sub = sub nsw <4 x i32> zeroinitializer, %a
%0 = xor <4 x i32> %b.lobit, <i32 -1, i32 -1, i32 -1, i32 -1>
%1 = and <4 x i32> %a, %0
%2 = and <4 x i32> %b.lobit, %sub
%cond = or <4 x i32> %1, %2
ret <4 x i32> %cond
}
if %b is zero:
%b.lobit = <4 x i32> zeroinitializer
%sub = sub nsw <4 x i32> zeroinitializer, %a
%0 = <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>
%1 = <4 x i32> %a
%2 = <4 x i32> zeroinitializer
%cond = or <4 x i32> %1, %2
ret <4 x i32> %a
}
whereas we currently generate:
psignd %xmm1, %xmm0
retq
which return 0, as %xmm1 is 0.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>