[cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long

Wed Apr 19 09:52:54 PDT 2017

I’m getting close. I think it may be an issue with an individual intrinsic. I’m looking for the X86 lowering of Instruction::FPToUI.

I found a comment around the rationale for using a conditional move versus a branch. I believe the predicate logic using a conditional move is causing INEXACT to be set from the other side of the predicate as the lowered x86_64 code executes both conversions whereas GCC uses a branch. That seems to be the difference.

I can’t find FPToUI in llvm/lib/Target/X86 so I’m trying to figure out what the cast gets renamed to in the target layer so I can find where the sequence is emitted.

$ more llvm/lib/Target/X86//README-X86-64.txt
…
Are we better off using branches instead of cmove to implement FP to
unsigned i64?

_conv:
        ucomiss LC0(%rip), %xmm0
        cvttss2siq      %xmm0, %rdx
        jb      L3
        subss   LC0(%rip), %xmm0
        movabsq $-9223372036854775808, %rax
        cvttss2siq      %xmm0, %rdx
        xorq    %rax, %rdx
L3:
        movq    %rdx, %rax
        ret

instead of

_conv:
        movss LCPI1_0(%rip), %xmm1
        cvttss2siq %xmm0, %rcx
        movaps %xmm0, %xmm2
        subss %xmm1, %xmm2
        cvttss2siq %xmm2, %rax
        movabsq $-9223372036854775808, %rdx
        xorq %rdx, %rax
        ucomiss %xmm1, %xmm0
        cmovb %rcx, %rax
        ret

> On 19 Apr 2017, at 2:10 PM, Michael Clark <michaeljclark at mac.com> wrote:
> 
> 
>> On 19 Apr 2017, at 1:14 PM, Tim Northover <t.p.northover at gmail.com <mailto:t.p.northover at gmail.com>> wrote:
>> 
>> On 18 April 2017 at 15:54, Michael Clark via cfe-dev
>> <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
>>> The only way towards completing a milestone is via fixing a number of small issues along
>>> the way…
>> 
>> I believe there's more to it than that. None of LLVM's optimizations
>> are aware of this extra side-channel of information (with possible
>> exceptions like avoiding speculating fdiv because of unavoidable
>> exceptions).
>> 
>> From what I remember, the real proposal is to replace all
>> floating-point IR with intrinsics when FENV_ACCESS is on, which the
>> optimizers by default won't have a clue about and will treat
>> conservatively (essentially like they're modifying external memory).
>> 
>> So be careful with drawing conclusions from small snippets; you're
>> probably not seeing the full range of LLVM's behaviour.
> 
> 
> Yes. I’m sure.
> 
> It reproduces with just the cast on its own: https://godbolt.org/g/myUoL2 <https://godbolt.org/g/myUoL2>
> 
> It appears to be in the LLVM lowering of the fptoui intrinsic so it must MC layer optimisations.
> 
> ; Function Attrs: noinline nounwind uwtable
> define i64 @_Z7fcvt_luf(float %f) #0 {
>   %1 = alloca float, align 4
>   store float %f, float* %1, align 4
>   %2 = load float, float* %1, align 4
>   %3 = fptoui float %2 to i64
>   ret i64 %3
> }
> 
> GCC performs a comparison with ucomiss and branches whereas Clang computes both forms and predicates the result using a conditional move. One of the conversions obviously is setting the INEXACT MXCSR flag.
> 
> Clang lowering (inexact set when result is exact):
> 
> fcvt_lu(float):
>         movss   xmm1, dword ptr [rip + .LCPI1_0] # xmm1 = mem[0],zero,zero,zero
>         movaps  xmm2, xmm0
>         subss   xmm2, xmm1
>         cvttss2si       rax, xmm2
>         movabs  rcx, -9223372036854775808
>         xor     rcx, rax
>         cvttss2si       rax, xmm0
>         ucomiss xmm0, xmm1
>         cmovae  rax, rcx
>         ret
> 
> GCC lowering (sets flags correctly):
> 
> fcvt_lu(float):
>         ucomiss xmm0, DWORD PTR .LC0[rip]
>         jnb     .L4
>         cvttss2si       rax, xmm0
>         ret
> .L4:
>         subss   xmm0, DWORD PTR .LC0[rip]
>         movabs  rdx, -9223372036854775808
>         cvttss2si       rax, xmm0
>         xor     rax, rdx
>         ret

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20170420/40268a8d/attachment.html>