[llvm-dev] FE_INEXACT being set for an exact conversion from float to i64
Michael Clark via llvm-dev
llvm-dev at lists.llvm.org
Wed Apr 19 10:06:58 PDT 2017
Confirmed it is in the target layer in LLVM.
Here is the test case: https://godbolt.org/g/kApSxe
$ g++ -O3 -lm fcvt.cc
$ ./a.out
1 exact
1 inexact
1 exact
1 inexact
$ clang++ -O3 -lm fcvt.cc
$ ./a.out
1 exact
1 inexact
1 inexact
1 inexact
$ cat fcvt.cc
#include <cstdio>
#include <cstdint>
#include <cmath>
#include <limits>
#include <fenv.h>
typedef signed int s32;
typedef unsigned int u32;
typedef signed long long s64;
typedef unsigned long long u64;
__attribute__ ((noinline)) s32 fcvt_wu(float f) { return s32(u32(f)); }
__attribute__ ((noinline)) s64 fcvt_lu(float f) { return s64(u64(f)); }
void test_fcvt_wu(float a)
{
feclearexcept(FE_ALL_EXCEPT);
printf("%d ", fcvt_wu(a));
printf("%s\n", fetestexcept(FE_INEXACT) ? "inexact" : "exact");
}
void test_fcvt_lu(float a)
{
feclearexcept(FE_ALL_EXCEPT);
printf("%lld ", fcvt_lu(a));
printf("%s\n", fetestexcept(FE_INEXACT) ? "inexact" : "exact");
}
int main()
{
fesetround(FE_TONEAREST);
test_fcvt_wu(1.0f);
test_fcvt_wu(1.1f);
test_fcvt_lu(1.0f);
test_fcvt_lu(1.1f);
}
> On 20 Apr 2017, at 5:01 AM, Michael Clark <michaeljclark at mac.com> wrote:
>
> Changing the list from cfe-dev to llvm-dev
>
>> On 20 Apr 2017, at 4:52 AM, Michael Clark <michaeljclark at mac.com <mailto:michaeljclark at mac.com>> wrote:
>>
>> I’m getting close. I think it may be an issue with an individual intrinsic. I’m looking for the X86 lowering of Instruction::FPToUI.
>>
>> I found a comment around the rationale for using a conditional move versus a branch. I believe the predicate logic using a conditional move is causing INEXACT to be set from the other side of the predicate as the lowered x86_64 code executes both conversions whereas GCC uses a branch. That seems to be the difference.
>>
>> I can’t find FPToUI in llvm/lib/Target/X86 so I’m trying to figure out what the cast gets renamed to in the target layer so I can find where the sequence is emitted.
>>
>>
>> $ more llvm/lib/Target/X86//README-X86-64.txt
>> …
>> Are we better off using branches instead of cmove to implement FP to
>> unsigned i64?
>>
>> _conv:
>> ucomiss LC0(%rip), %xmm0
>> cvttss2siq %xmm0, %rdx
>> jb L3
>> subss LC0(%rip), %xmm0
>> movabsq $-9223372036854775808, %rax
>> cvttss2siq %xmm0, %rdx
>> xorq %rax, %rdx
>> L3:
>> movq %rdx, %rax
>> ret
>>
>> instead of
>>
>> _conv:
>> movss LCPI1_0(%rip), %xmm1
>> cvttss2siq %xmm0, %rcx
>> movaps %xmm0, %xmm2
>> subss %xmm1, %xmm2
>> cvttss2siq %xmm2, %rax
>> movabsq $-9223372036854775808, %rdx
>> xorq %rdx, %rax
>> ucomiss %xmm1, %xmm0
>> cmovb %rcx, %rax
>> ret
>>
>>
>>> On 19 Apr 2017, at 2:10 PM, Michael Clark <michaeljclark at mac.com <mailto:michaeljclark at mac.com>> wrote:
>>>
>>>
>>>> On 19 Apr 2017, at 1:14 PM, Tim Northover <t.p.northover at gmail.com <mailto:t.p.northover at gmail.com>> wrote:
>>>>
>>>> On 18 April 2017 at 15:54, Michael Clark via cfe-dev
>>>> <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
>>>>> The only way towards completing a milestone is via fixing a number of small issues along
>>>>> the way…
>>>>
>>>> I believe there's more to it than that. None of LLVM's optimizations
>>>> are aware of this extra side-channel of information (with possible
>>>> exceptions like avoiding speculating fdiv because of unavoidable
>>>> exceptions).
>>>>
>>>> From what I remember, the real proposal is to replace all
>>>> floating-point IR with intrinsics when FENV_ACCESS is on, which the
>>>> optimizers by default won't have a clue about and will treat
>>>> conservatively (essentially like they're modifying external memory).
>>>>
>>>> So be careful with drawing conclusions from small snippets; you're
>>>> probably not seeing the full range of LLVM's behaviour.
>>>
>>>
>>> Yes. I’m sure.
>>>
>>> It reproduces with just the cast on its own: https://godbolt.org/g/myUoL2 <https://godbolt.org/g/myUoL2>
>>>
>>> It appears to be in the LLVM lowering of the fptoui intrinsic so it must MC layer optimisations.
>>>
>>> ; Function Attrs: noinline nounwind uwtable
>>> define i64 @_Z7fcvt_luf(float %f) #0 {
>>> %1 = alloca float, align 4
>>> store float %f, float* %1, align 4
>>> %2 = load float, float* %1, align 4
>>> %3 = fptoui float %2 to i64
>>> ret i64 %3
>>> }
>>>
>>> GCC performs a comparison with ucomiss and branches whereas Clang computes both forms and predicates the result using a conditional move. One of the conversions obviously is setting the INEXACT MXCSR flag.
>>>
>>> Clang lowering (inexact set when result is exact):
>>>
>>> fcvt_lu(float):
>>> movss xmm1, dword ptr [rip + .LCPI1_0] # xmm1 = mem[0],zero,zero,zero
>>> movaps xmm2, xmm0
>>> subss xmm2, xmm1
>>> cvttss2si rax, xmm2
>>> movabs rcx, -9223372036854775808
>>> xor rcx, rax
>>> cvttss2si rax, xmm0
>>> ucomiss xmm0, xmm1
>>> cmovae rax, rcx
>>> ret
>>>
>>> GCC lowering (sets flags correctly):
>>>
>>> fcvt_lu(float):
>>> ucomiss xmm0, DWORD PTR .LC0[rip]
>>> jnb .L4
>>> cvttss2si rax, xmm0
>>> ret
>>> .L4:
>>> subss xmm0, DWORD PTR .LC0[rip]
>>> movabs rdx, -9223372036854775808
>>> cvttss2si rax, xmm0
>>> xor rax, rdx
>>> ret
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170420/7dbbb67d/attachment.html>
More information about the llvm-dev
mailing list