<html>

    <head>

      <base href="https://llvm.org/bugs/" />

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW --- - [X86] float/double -> unsigned long conversion slow when inputs are predictable"

   href="https://llvm.org/bugs/show_bug.cgi?id=31602">31602</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>[X86] float/double -> unsigned long conversion slow when inputs are predictable

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>libraries

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>Linux

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>normal

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>Backend: X86

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>mkuper@google.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>llvm-bugs@lists.llvm.org

          </td>

        </tr>

        <tr>

          <th>Classification</th>

          <td>Unclassified

          </td>

        </tr></table>

      <p>

        <div>

        <pre>SSE and AVX (up until AVX512) don't have convert instructions from FP (both

float or double) and unsigned long. So, these conversion have to be emulated

using FP -> signed long conversions.

GCC lowers this:

unsigned long foo(double x) {

  return x;

}

as:

foo(double):

        movsd   .LC0(%rip), %xmm1

        ucomisd %xmm1, %xmm0

        jnb     .L2

        cvttsd2siq      %xmm0, %rax

        ret

.L2:

        subsd   %xmm1, %xmm0

        movabsq $-9223372036854775808, %rdx

        cvttsd2siq      %xmm0, %rax

        xorq    %rdx, %rax

        ret

.LC0:

        .long   0

        .long   1138753536

That is - check whether the value is in range, and if not, force it into range,

convert, and correct the value.

What we do, on the other hand, is:

.LCPI0_0:

        .quad   4890909195324358656     # double 9.2233720368547758E+18

foo(double):

        movsd   .LCPI0_0(%rip), %xmm1

        movapd  %xmm0, %xmm2

        subsd   %xmm1, %xmm2

        cvttsd2si       %xmm2, %rax

        movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000

        xorq    %rax, %rcx

        cvttsd2si       %xmm0, %rax

        ucomisd %xmm1, %xmm0

        cmovaeq %rcx, %rax

        retq

Which is basically an if-converted version of the GCC code.

Since cvttsd2si has a fairly long latency, the GCC version is much faster when

the branch is well-predicted, and slower when it's not.

But it seems like in most cases this branch should be well-predicted - e.g. if

all inputs are "small", and actually fit into the signed range.

A few additional notes:

1) Our current version is problematic in the presence of FP exceptions, see

PR17686.

2) I tried playing around with selecting on the input instead of the output,

but that doesn't really improve the situation, since we then need to adjust the

sign bit of the output of one of the converts.

There are two options here - (1) adjusting and selecting again between the

original and the adjusted version, or (2) fudging the adjustment so that it's a

nop for the right convert. ICC generates code which is basically (2). This

avoids the problem in PR17686, but both options appear to be even slower than

what we have now.</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>