[llvm-dev] [cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long
Michael Clark via llvm-dev
llvm-dev at lists.llvm.org
Wed Apr 19 10:01:31 PDT 2017
Changing the list from cfe-dev to llvm-dev
> On 20 Apr 2017, at 4:52 AM, Michael Clark <michaeljclark at mac.com> wrote:
>
> I’m getting close. I think it may be an issue with an individual intrinsic. I’m looking for the X86 lowering of Instruction::FPToUI.
>
> I found a comment around the rationale for using a conditional move versus a branch. I believe the predicate logic using a conditional move is causing INEXACT to be set from the other side of the predicate as the lowered x86_64 code executes both conversions whereas GCC uses a branch. That seems to be the difference.
>
> I can’t find FPToUI in llvm/lib/Target/X86 so I’m trying to figure out what the cast gets renamed to in the target layer so I can find where the sequence is emitted.
>
>
> $ more llvm/lib/Target/X86//README-X86-64.txt
> …
> Are we better off using branches instead of cmove to implement FP to
> unsigned i64?
>
> _conv:
> ucomiss LC0(%rip), %xmm0
> cvttss2siq %xmm0, %rdx
> jb L3
> subss LC0(%rip), %xmm0
> movabsq $-9223372036854775808, %rax
> cvttss2siq %xmm0, %rdx
> xorq %rax, %rdx
> L3:
> movq %rdx, %rax
> ret
>
> instead of
>
> _conv:
> movss LCPI1_0(%rip), %xmm1
> cvttss2siq %xmm0, %rcx
> movaps %xmm0, %xmm2
> subss %xmm1, %xmm2
> cvttss2siq %xmm2, %rax
> movabsq $-9223372036854775808, %rdx
> xorq %rdx, %rax
> ucomiss %xmm1, %xmm0
> cmovb %rcx, %rax
> ret
>
>
>> On 19 Apr 2017, at 2:10 PM, Michael Clark <michaeljclark at mac.com <mailto:michaeljclark at mac.com>> wrote:
>>
>>
>>> On 19 Apr 2017, at 1:14 PM, Tim Northover <t.p.northover at gmail.com <mailto:t.p.northover at gmail.com>> wrote:
>>>
>>> On 18 April 2017 at 15:54, Michael Clark via cfe-dev
>>> <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
>>>> The only way towards completing a milestone is via fixing a number of small issues along
>>>> the way…
>>>
>>> I believe there's more to it than that. None of LLVM's optimizations
>>> are aware of this extra side-channel of information (with possible
>>> exceptions like avoiding speculating fdiv because of unavoidable
>>> exceptions).
>>>
>>> From what I remember, the real proposal is to replace all
>>> floating-point IR with intrinsics when FENV_ACCESS is on, which the
>>> optimizers by default won't have a clue about and will treat
>>> conservatively (essentially like they're modifying external memory).
>>>
>>> So be careful with drawing conclusions from small snippets; you're
>>> probably not seeing the full range of LLVM's behaviour.
>>
>>
>> Yes. I’m sure.
>>
>> It reproduces with just the cast on its own: https://godbolt.org/g/myUoL2 <https://godbolt.org/g/myUoL2>
>>
>> It appears to be in the LLVM lowering of the fptoui intrinsic so it must MC layer optimisations.
>>
>> ; Function Attrs: noinline nounwind uwtable
>> define i64 @_Z7fcvt_luf(float %f) #0 {
>> %1 = alloca float, align 4
>> store float %f, float* %1, align 4
>> %2 = load float, float* %1, align 4
>> %3 = fptoui float %2 to i64
>> ret i64 %3
>> }
>>
>> GCC performs a comparison with ucomiss and branches whereas Clang computes both forms and predicates the result using a conditional move. One of the conversions obviously is setting the INEXACT MXCSR flag.
>>
>> Clang lowering (inexact set when result is exact):
>>
>> fcvt_lu(float):
>> movss xmm1, dword ptr [rip + .LCPI1_0] # xmm1 = mem[0],zero,zero,zero
>> movaps xmm2, xmm0
>> subss xmm2, xmm1
>> cvttss2si rax, xmm2
>> movabs rcx, -9223372036854775808
>> xor rcx, rax
>> cvttss2si rax, xmm0
>> ucomiss xmm0, xmm1
>> cmovae rax, rcx
>> ret
>>
>> GCC lowering (sets flags correctly):
>>
>> fcvt_lu(float):
>> ucomiss xmm0, DWORD PTR .LC0[rip]
>> jnb .L4
>> cvttss2si rax, xmm0
>> ret
>> .L4:
>> subss xmm0, DWORD PTR .LC0[rip]
>> movabs rdx, -9223372036854775808
>> cvttss2si rax, xmm0
>> xor rax, rdx
>> ret
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170420/259f8a03/attachment-0001.html>
More information about the llvm-dev
mailing list