<html>

    <head>

      <base href="https://llvm.org/bugs/" />

    </head>

    <body><span class="vcard"><a class="email" href="mailto:ahmed.bougacha@gmail.com" title="Ahmed Bougacha <ahmed.bougacha@gmail.com>"> <span class="fn">Ahmed Bougacha</span></a>

</span> changed

              <a class="bz_bug_link 

          bz_status_REOPENED "

   title="REOPENED --- - clang c compiler produces wrong result for the attached c code with -O2 optimzation"

   href="https://llvm.org/bugs/show_bug.cgi?id=26110">bug 26110</a>

        <br>

             <table border="1" cellspacing="0" cellpadding="8">

          <tr>

            <th>What</th>

            <th>Removed</th>

            <th>Added</th>

          </tr>

         <tr>

           <td style="text-align:right;">Status</td>

           <td>RESOLVED

           </td>

           <td>REOPENED

           </td>

         </tr>

         <tr>

           <td style="text-align:right;">CC</td>

           <td>

           </td>

           <td>ahmed.bougacha@gmail.com

           </td>

         </tr>

         <tr>

           <td style="text-align:right;">Component</td>

           <td>LLVM Codegen

           </td>

           <td>Backend: X86

           </td>

         </tr>

         <tr>

           <td style="text-align:right;">Version</td>

           <td>3.7

           </td>

           <td>trunk

           </td>

         </tr>

         <tr>

           <td style="text-align:right;">Resolution</td>

           <td>FIXED

           </td>

           <td>---

           </td>

         </tr>

         <tr>

           <td style="text-align:right;">Assignee</td>

           <td>unassignedclangbugs@nondot.org

           </td>

           <td>unassignedbugs@nondot.org

           </td>

         </tr>

         <tr>

           <td style="text-align:right;">Product</td>

           <td>clang

           </td>

           <td>libraries

           </td>

         </tr></table>

      <p>

        <div>

            <b><a class="bz_bug_link 

          bz_status_REOPENED "

   title="REOPENED --- - clang c compiler produces wrong result for the attached c code with -O2 optimzation"

   href="https://llvm.org/bugs/show_bug.cgi?id=26110#c3">Comment # 3</a>

              on <a class="bz_bug_link 

          bz_status_REOPENED "

   title="REOPENED --- - clang c compiler produces wrong result for the attached c code with -O2 optimzation"

   href="https://llvm.org/bugs/show_bug.cgi?id=26110">bug 26110</a>

              from <span class="vcard"><a class="email" href="mailto:ahmed.bougacha@gmail.com" title="Ahmed Bougacha <ahmed.bougacha@gmail.com>"> <span class="fn">Ahmed Bougacha</span></a>

</span></b>

        <pre>So; I looked a little closer. Sanjay's bisect was correct. clang-700 is pretty

old now; I bisected to:

  r229099 [SimplifyCFG] Be more aggressive

Sure enough, this still reproduces on trunk with -mllvm

-phi-node-folding-threshold=1.

Long story short: the problematic pattern is:

  (c ? -v : v)

which we lower to (because "c" is <4 x i1>, lowered as a vector mask):

  (~c & v) | (c & -v)

roughly corresponding to this IR:

  define <4 x i32> @t(<4 x i32> %v, <4 x i32> %c) {

    %cl = shl <4 x i32> %c, <i32 31, i32 31, i32 31, i32 31>

    %cs = ashr <4 x i32> %c, <i32 31, i32 31, i32 31, i32 31>

    %tmp2 = trunc <4 x i32> %cs to <4 x i1>

    ; ^ not as artificial as it looks: equivalent to a legalized vsetcc

    %mv = sub nsw <4 x i32> zeroinitializer, %v

    %r = select <4 x i1> %tmp2, <4 x i32> %v, <4 x i32> %mv

    ret <4 x i32> %r

  }

The SSE2 codegen is pretty straightforward:

    xorps  %xmm1, %xmm1

    ...                   # xmm6 <- %v

    ...                   # xmm3 <- %c

    psubd  %xmm6, %xmm1   # 0 - v                # 0 - 5 -> -5

    movaps %xmm3, %xmm0   # c                    # 0 -> 0

    pandn  %xmm6, %xmm0   # ~c & v               # ~0 & 5 -> 5

    pand   %xmm3, %xmm1   # c & -v               # -5 & 0 -> 0

    por    %xmm0, %xmm1   # (~c & v) | (c & -v)  # 0 | 5 -> 5

However when we have SSSE3 (the default on OS X), we try to match it to PSIGND,

instead doing:

    psignd    %xmm3, %xmm1    # (c < 0 ? -v : (c > 0 ? v : 0))

                              #   c is a mask, so (c > 0) == 0

                              # (c ? -v : 0)

                              # (0 ? -5 : 0)

                              #   -> 0

Which is not equivalent; one does:

  (c ? -v : 0)

the other:

  (c ? -v : v)

Now. This bug existed since 2010. However, I think we don't know about this

issue because of operand canonicalization.

The PSIGN combine matches:

  (or (and m, x), (pandn m, (0 - x)))

  (or (and x, m), (pandn m, (0 - x)))

  (or (pandn m, (0 - x)), (and m, x))

  (or (pandn m, (0 - x)), (and x, m))

but not the variants of:

  (or (and m, (0 - x)), (pandn m, x))

Which is what gets generated for the function above (the most obvious IR that I

could write).

I think this is pretty easy to fix: instead of using c as a mask, put any

non-sign bit in there, to default to the 'v' case.

So, this should work:

    por       <1,1,1,1>, %xmm3 # c' = c | 1

    psignd    %xmm3, %xmm1     # (c' < 0 ? -v : (c' > 0 ? v : 0))

                               #   c is a mask, so c' is either 1 or 0xff..f

                               # (c' == 0xff..f ? -v : (c' != 0 ? v : v))

                               # (c' == 0xff..f ? -v : v)

                               # (0 ? -5 : 5)

                               #   -> 5

CP loads are cheap, so this is probably still a win over the SSE2 codegen:

    psrad    $31, %xmm1

    pxor    %xmm2, %xmm2

    psubd    %xmm0, %xmm2

    pand    %xmm1, %xmm2

    pandn    %xmm0, %xmm1

    por    %xmm1, %xmm2

    movdqa    %xmm2, %xmm0

Note that I don't think the couple of PSIGN tests in trunk are correct either.

Consider test/CodeGen/X86/vec-sign.ll:

define <4 x i32> @signd(<4 x i32> %a, <4 x i32> %b) nounwind {

entry:

  %b.lobit = ashr <4 x i32> %b, <i32 31, i32 31, i32 31, i32 31>

  %sub = sub nsw <4 x i32> zeroinitializer, %a

  %0 = xor <4 x i32> %b.lobit, <i32 -1, i32 -1, i32 -1, i32 -1>

  %1 = and <4 x i32> %a, %0

  %2 = and <4 x i32> %b.lobit, %sub

  %cond = or <4 x i32> %1, %2

  ret <4 x i32> %cond

}

if %b is zero:

  %b.lobit = <4 x i32> zeroinitializer

  %sub = sub nsw <4 x i32> zeroinitializer, %a

  %0 = <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>

  %1 = <4 x i32> %a

  %2 = <4 x i32> zeroinitializer

  %cond = or <4 x i32> %1, %2

  ret <4 x i32> %a

}

whereas we currently generate:

  psignd %xmm1, %xmm0

  retq

which return 0, as %xmm1 is 0.</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>