<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class="">I’m getting close. I think it may be an issue with an individual intrinsic. I’m looking for the X86 lowering of Instruction::FPToUI.</div><div class=""><br class=""></div><div class="">I found a comment around the rationale for using a conditional move versus a branch. I believe the predicate logic using a conditional move is causing INEXACT to be set from the other side of the predicate as the lowered x86_64 code executes both conversions whereas GCC uses a branch. That seems to be the difference.</div><div class=""><br class=""></div><div class="">I can’t find FPToUI in llvm/lib/Target/X86 so I’m trying to figure out what the cast gets renamed to in the target layer so I can find where the sequence is emitted.</div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">$ more llvm/lib/Target/X86//README-X86-64.txt</div><div class="">…</div><div class="">Are we better off using branches instead of cmove to implement FP to<br class="">unsigned i64?<br class=""><br class="">_conv:<br class=""> ucomiss LC0(%rip), %xmm0<br class=""> cvttss2siq %xmm0, %rdx<br class=""> jb L3<br class=""> subss LC0(%rip), %xmm0<br class=""> movabsq $-9223372036854775808, %rax<br class=""> cvttss2siq %xmm0, %rdx<br class=""> xorq %rax, %rdx<br class="">L3:<br class=""> movq %rdx, %rax<br class=""> ret<br class=""><br class="">instead of<br class=""><br class="">_conv:<br class=""> movss LCPI1_0(%rip), %xmm1<br class=""> cvttss2siq %xmm0, %rcx<br class=""> movaps %xmm0, %xmm2<br class=""> subss %xmm1, %xmm2<br class=""> cvttss2siq %xmm2, %rax<br class=""> movabsq $-9223372036854775808, %rdx<br class=""> xorq %rdx, %rax<br class=""> ucomiss %xmm1, %xmm0<br class=""> cmovb %rcx, %rax<br class=""> ret<br class=""><br class=""></div><br class=""><div><blockquote type="cite" class=""><div class="">On 19 Apr 2017, at 2:10 PM, Michael Clark <<a href="mailto:michaeljclark@mac.com" class="">michaeljclark@mac.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta http-equiv="Content-Type" content="text/html charset=utf-8" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div class=""><blockquote type="cite" class=""><div class="">On 19 Apr 2017, at 1:14 PM, Tim Northover <<a href="mailto:t.p.northover@gmail.com" class="">t.p.northover@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div class="">On 18 April 2017 at 15:54, Michael Clark via cfe-dev<br class=""><<a href="mailto:cfe-dev@lists.llvm.org" class="">cfe-dev@lists.llvm.org</a>> wrote:<br class=""><blockquote type="cite" class="">The only way towards completing a milestone is via fixing a number of small issues along<br class="">the way…<br class=""></blockquote><br class="">I believe there's more to it than that. None of LLVM's optimizations<br class="">are aware of this extra side-channel of information (with possible<br class="">exceptions like avoiding speculating fdiv because of unavoidable<br class="">exceptions).<br class=""><br class="">From what I remember, the real proposal is to replace all<br class="">floating-point IR with intrinsics when FENV_ACCESS is on, which the<br class="">optimizers by default won't have a clue about and will treat<br class="">conservatively (essentially like they're modifying external memory).<br class=""><br class="">So be careful with drawing conclusions from small snippets; you're<br class="">probably not seeing the full range of LLVM's behaviour.<br class=""></div></div></blockquote></div><br class=""><div class=""><br class=""></div><div class=""><div class="">Yes. I’m sure.</div><div class=""><br class=""></div><div class="">It reproduces with just the cast on its own: <a href="https://godbolt.org/g/myUoL2" class="">https://godbolt.org/g/myUoL2</a></div><div class=""><br class=""></div><div class="">It appears to be in the LLVM lowering of the fptoui intrinsic so it must MC layer optimisations.</div><div class=""><br class=""></div><blockquote class="" style="margin: 0px 0px 0px 40px; border: none; padding: 0px;"><div class=""><font face="Courier" class="">; Function Attrs: noinline nounwind uwtable</font></div><div class=""><font face="Courier" class="">define i64 @_Z7fcvt_luf(float %f) #0 {</font></div><div class=""><font face="Courier" class=""> %1 = alloca float, align 4</font></div><div class=""><font face="Courier" class=""> store float %f, float* %1, align 4</font></div><div class=""><font face="Courier" class=""> %2 = load float, float* %1, align 4</font></div><div class=""><font face="Courier" class=""> %3 = fptoui float %2 to i64</font></div><div class=""><font face="Courier" class=""> ret i64 %3</font></div><div class=""><font face="Courier" class="">}</font></div></blockquote><div class=""><br class=""></div><div class="">GCC performs a comparison with ucomiss and branches whereas Clang computes both forms and predicates the result using a conditional move. One of the conversions obviously is setting the INEXACT MXCSR flag.</div><div class=""><br class=""></div><div class="">Clang lowering (inexact set when result is exact):</div><div class=""><br class=""></div><blockquote class="" style="margin: 0px 0px 0px 40px; border: none; padding: 0px;"><font face="Courier" class="">fcvt_lu(float):<br class=""> movss xmm1, dword ptr [rip + .LCPI1_0] # xmm1 = mem[0],zero,zero,zero<br class=""> movaps xmm2, xmm0<br class=""> subss xmm2, xmm1<br class=""> cvttss2si rax, xmm2<br class=""> movabs rcx, -9223372036854775808<br class=""> xor rcx, rax<br class=""> cvttss2si rax, xmm0<br class=""> ucomiss xmm0, xmm1<br class=""> cmovae rax, rcx<br class=""> ret</font></blockquote><div class=""><br class=""></div><div class="">GCC lowering (sets flags correctly):</div><div class=""><br class=""></div><blockquote class="" style="margin: 0px 0px 0px 40px; border: none; padding: 0px;"><font face="Courier" class="">fcvt_lu(float):<br class=""> ucomiss xmm0, DWORD PTR .LC0[rip]<br class=""> jnb .L4<br class=""> cvttss2si rax, xmm0<br class=""> ret<br class="">.L4:<br class=""> subss xmm0, DWORD PTR .LC0[rip]<br class=""> movabs rdx, -9223372036854775808<br class=""> cvttss2si rax, xmm0<br class=""> xor rax, rdx<br class=""> ret</font></blockquote></div></div></div></blockquote></div><br class=""></body></html>