[llvm-bugs] [Bug 31602] New: [X86] float/double -> unsigned long conversion slow when inputs are predictable
via llvm-bugs
llvm-bugs at lists.llvm.org
Tue Jan 10 18:54:44 PST 2017
https://llvm.org/bugs/show_bug.cgi?id=31602
Bug ID: 31602
Summary: [X86] float/double -> unsigned long conversion slow
when inputs are predictable
Product: libraries
Version: trunk
Hardware: PC
OS: Linux
Status: NEW
Severity: normal
Priority: P
Component: Backend: X86
Assignee: unassignedbugs at nondot.org
Reporter: mkuper at google.com
CC: llvm-bugs at lists.llvm.org
Classification: Unclassified
SSE and AVX (up until AVX512) don't have convert instructions from FP (both
float or double) and unsigned long. So, these conversion have to be emulated
using FP -> signed long conversions.
GCC lowers this:
unsigned long foo(double x) {
return x;
}
as:
foo(double):
movsd .LC0(%rip), %xmm1
ucomisd %xmm1, %xmm0
jnb .L2
cvttsd2siq %xmm0, %rax
ret
.L2:
subsd %xmm1, %xmm0
movabsq $-9223372036854775808, %rdx
cvttsd2siq %xmm0, %rax
xorq %rdx, %rax
ret
.LC0:
.long 0
.long 1138753536
That is - check whether the value is in range, and if not, force it into range,
convert, and correct the value.
What we do, on the other hand, is:
.LCPI0_0:
.quad 4890909195324358656 # double 9.2233720368547758E+18
foo(double):
movsd .LCPI0_0(%rip), %xmm1
movapd %xmm0, %xmm2
subsd %xmm1, %xmm2
cvttsd2si %xmm2, %rax
movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000
xorq %rax, %rcx
cvttsd2si %xmm0, %rax
ucomisd %xmm1, %xmm0
cmovaeq %rcx, %rax
retq
Which is basically an if-converted version of the GCC code.
Since cvttsd2si has a fairly long latency, the GCC version is much faster when
the branch is well-predicted, and slower when it's not.
But it seems like in most cases this branch should be well-predicted - e.g. if
all inputs are "small", and actually fit into the signed range.
A few additional notes:
1) Our current version is problematic in the presence of FP exceptions, see
PR17686.
2) I tried playing around with selecting on the input instead of the output,
but that doesn't really improve the situation, since we then need to adjust the
sign bit of the output of one of the converts.
There are two options here - (1) adjusting and selecting again between the
original and the adjusted version, or (2) fudging the adjustment so that it's a
nop for the right convert. ICC generates code which is basically (2). This
avoids the problem in PR17686, but both options appear to be even slower than
what we have now.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20170111/37fbc672/attachment-0001.html>
More information about the llvm-bugs
mailing list