[llvm-bugs] [Bug 26110] clang c compiler produces wrong result for the attached c code with -O2 optimzation
via llvm-bugs
llvm-bugs at lists.llvm.org
Wed Feb 10 17:15:04 PST 2016
https://llvm.org/bugs/show_bug.cgi?id=26110
Ahmed Bougacha <ahmed.bougacha at gmail.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|RESOLVED |REOPENED
CC| |ahmed.bougacha at gmail.com
Component|LLVM Codegen |Backend: X86
Version|3.7 |trunk
Resolution|FIXED |---
Assignee|unassignedclangbugs at nondot. |unassignedbugs at nondot.org
|org |
Product|clang |libraries
--- Comment #3 from Ahmed Bougacha <ahmed.bougacha at gmail.com> ---
So; I looked a little closer. Sanjay's bisect was correct. clang-700 is pretty
old now; I bisected to:
r229099 [SimplifyCFG] Be more aggressive
Sure enough, this still reproduces on trunk with -mllvm
-phi-node-folding-threshold=1.
Long story short: the problematic pattern is:
(c ? -v : v)
which we lower to (because "c" is <4 x i1>, lowered as a vector mask):
(~c & v) | (c & -v)
roughly corresponding to this IR:
define <4 x i32> @t(<4 x i32> %v, <4 x i32> %c) {
%cl = shl <4 x i32> %c, <i32 31, i32 31, i32 31, i32 31>
%cs = ashr <4 x i32> %c, <i32 31, i32 31, i32 31, i32 31>
%tmp2 = trunc <4 x i32> %cs to <4 x i1>
; ^ not as artificial as it looks: equivalent to a legalized vsetcc
%mv = sub nsw <4 x i32> zeroinitializer, %v
%r = select <4 x i1> %tmp2, <4 x i32> %v, <4 x i32> %mv
ret <4 x i32> %r
}
The SSE2 codegen is pretty straightforward:
xorps %xmm1, %xmm1
... # xmm6 <- %v
... # xmm3 <- %c
psubd %xmm6, %xmm1 # 0 - v # 0 - 5 -> -5
movaps %xmm3, %xmm0 # c # 0 -> 0
pandn %xmm6, %xmm0 # ~c & v # ~0 & 5 -> 5
pand %xmm3, %xmm1 # c & -v # -5 & 0 -> 0
por %xmm0, %xmm1 # (~c & v) | (c & -v) # 0 | 5 -> 5
However when we have SSSE3 (the default on OS X), we try to match it to PSIGND,
instead doing:
psignd %xmm3, %xmm1 # (c < 0 ? -v : (c > 0 ? v : 0))
# c is a mask, so (c > 0) == 0
# (c ? -v : 0)
# (0 ? -5 : 0)
# -> 0
Which is not equivalent; one does:
(c ? -v : 0)
the other:
(c ? -v : v)
Now. This bug existed since 2010. However, I think we don't know about this
issue because of operand canonicalization.
The PSIGN combine matches:
(or (and m, x), (pandn m, (0 - x)))
(or (and x, m), (pandn m, (0 - x)))
(or (pandn m, (0 - x)), (and m, x))
(or (pandn m, (0 - x)), (and x, m))
but not the variants of:
(or (and m, (0 - x)), (pandn m, x))
Which is what gets generated for the function above (the most obvious IR that I
could write).
I think this is pretty easy to fix: instead of using c as a mask, put any
non-sign bit in there, to default to the 'v' case.
So, this should work:
por <1,1,1,1>, %xmm3 # c' = c | 1
psignd %xmm3, %xmm1 # (c' < 0 ? -v : (c' > 0 ? v : 0))
# c is a mask, so c' is either 1 or 0xff..f
# (c' == 0xff..f ? -v : (c' != 0 ? v : v))
# (c' == 0xff..f ? -v : v)
# (0 ? -5 : 5)
# -> 5
CP loads are cheap, so this is probably still a win over the SSE2 codegen:
psrad $31, %xmm1
pxor %xmm2, %xmm2
psubd %xmm0, %xmm2
pand %xmm1, %xmm2
pandn %xmm0, %xmm1
por %xmm1, %xmm2
movdqa %xmm2, %xmm0
Note that I don't think the couple of PSIGN tests in trunk are correct either.
Consider test/CodeGen/X86/vec-sign.ll:
define <4 x i32> @signd(<4 x i32> %a, <4 x i32> %b) nounwind {
entry:
%b.lobit = ashr <4 x i32> %b, <i32 31, i32 31, i32 31, i32 31>
%sub = sub nsw <4 x i32> zeroinitializer, %a
%0 = xor <4 x i32> %b.lobit, <i32 -1, i32 -1, i32 -1, i32 -1>
%1 = and <4 x i32> %a, %0
%2 = and <4 x i32> %b.lobit, %sub
%cond = or <4 x i32> %1, %2
ret <4 x i32> %cond
}
if %b is zero:
%b.lobit = <4 x i32> zeroinitializer
%sub = sub nsw <4 x i32> zeroinitializer, %a
%0 = <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>
%1 = <4 x i32> %a
%2 = <4 x i32> zeroinitializer
%cond = or <4 x i32> %1, %2
ret <4 x i32> %a
}
whereas we currently generate:
psignd %xmm1, %xmm0
retq
which return 0, as %xmm1 is 0.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20160211/7c05dfe8/attachment-0001.html>
More information about the llvm-bugs
mailing list