<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Confirmed it is in the target layer in LLVM.<div class=""><br class=""></div><div class="">Here is the test case: <a href="https://godbolt.org/g/kApSxe" class="">https://godbolt.org/g/kApSxe</a><div class=""><div class=""><br class=""></div><div class="">$ g++ -O3 -lm <a href="http://fcvt.cc" class="">fcvt.cc</a> <br class="">$ ./a.out <br class="">1 exact<br class="">1 inexact<br class="">1 exact<br class="">1 inexact<br class=""><br class="">$ clang++ -O3 -lm fcvt.cc <br class="">$ ./a.out <br class="">1 exact<br class="">1 inexact<br class="">1 inexact<br class="">1 inexact<br class=""><br class="">$ cat fcvt.cc<br class="">#include <cstdio><br class="">#include <cstdint><br class="">#include <cmath><br class="">#include <limits><br class="">#include <fenv.h><br class=""><br class="">typedef signed int         s32;<br class="">typedef unsigned int       u32;<br class="">typedef signed long long   s64;<br class="">typedef unsigned long long u64;<br class=""><br class="">__attribute__ ((noinline)) s32 fcvt_wu(float f) { return s32(u32(f)); }<br class="">__attribute__ ((noinline)) s64 fcvt_lu(float f) { return s64(u64(f)); }<br class=""><br class="">void test_fcvt_wu(float a)<br class="">{<br class=""><span class="Apple-tab-span" style="white-space:pre">     </span>feclearexcept(FE_ALL_EXCEPT);<br class=""><span class="Apple-tab-span" style="white-space:pre">  </span>printf("%d ", fcvt_wu(a));<br class=""><span class="Apple-tab-span" style="white-space:pre">   </span>printf("%s\n", fetestexcept(FE_INEXACT) ? "inexact" : "exact");<br class="">}<br class=""><br class="">void test_fcvt_lu(float a)<br class="">{       <br class=""><span class="Apple-tab-span" style="white-space:pre">       </span>feclearexcept(FE_ALL_EXCEPT);<br class=""><span class="Apple-tab-span" style="white-space:pre">  </span>printf("%lld ", fcvt_lu(a));<br class=""><span class="Apple-tab-span" style="white-space:pre"> </span>printf("%s\n", fetestexcept(FE_INEXACT) ? "inexact" : "exact");<br class="">}<br class=""><br class="">int main()<br class="">{<br class=""><span class="Apple-tab-span" style="white-space:pre">  </span>fesetround(FE_TONEAREST);<br class=""><br class=""><span class="Apple-tab-span" style="white-space:pre"> </span>test_fcvt_wu(1.0f);<br class=""><span class="Apple-tab-span" style="white-space:pre">    </span>test_fcvt_wu(1.1f);<br class=""><span class="Apple-tab-span" style="white-space:pre">    </span>test_fcvt_lu(1.0f);<br class=""><span class="Apple-tab-span" style="white-space:pre">    </span>test_fcvt_lu(1.1f);<br class="">}<br class=""><br class=""></div><div class=""><br class=""><div><blockquote type="cite" class=""><div class="">On 20 Apr 2017, at 5:01 AM, Michael Clark <<a href="mailto:michaeljclark@mac.com" class="">michaeljclark@mac.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta http-equiv="Content-Type" content="text/html charset=utf-8" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class="">Changing the list from cfe-dev to llvm-dev</div><br class=""><div class=""><blockquote type="cite" class=""><div class="">On 20 Apr 2017, at 4:52 AM, Michael Clark <<a href="mailto:michaeljclark@mac.com" class="">michaeljclark@mac.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta http-equiv="Content-Type" content="text/html charset=utf-8" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class="">I’m getting close. I think it may be an issue with an individual intrinsic. I’m looking for the X86 lowering of Instruction::FPToUI.</div><div class=""><br class=""></div><div class="">I found a comment around the rationale for using a conditional move versus a branch. I believe the predicate logic using a conditional move is causing INEXACT to be set from the other side of the predicate as the lowered x86_64 code executes both conversions whereas GCC uses a branch. That seems to be the difference.</div><div class=""><br class=""></div><div class="">I can’t find FPToUI in llvm/lib/Target/X86 so I’m trying to figure out what the cast gets renamed to in the target layer so I can find where the sequence is emitted.</div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">$ more llvm/lib/Target/X86//README-X86-64.txt</div><div class="">…</div><div class="">Are we better off using branches instead of cmove to implement FP to<br class="">unsigned i64?<br class=""><br class="">_conv:<br class="">        ucomiss LC0(%rip), %xmm0<br class="">        cvttss2siq      %xmm0, %rdx<br class="">        jb      L3<br class="">        subss   LC0(%rip), %xmm0<br class="">        movabsq $-9223372036854775808, %rax<br class="">        cvttss2siq      %xmm0, %rdx<br class="">        xorq    %rax, %rdx<br class="">L3:<br class="">        movq    %rdx, %rax<br class="">        ret<br class=""><br class="">instead of<br class=""><br class="">_conv:<br class="">        movss LCPI1_0(%rip), %xmm1<br class="">        cvttss2siq %xmm0, %rcx<br class="">        movaps %xmm0, %xmm2<br class="">        subss %xmm1, %xmm2<br class="">        cvttss2siq %xmm2, %rax<br class="">        movabsq $-9223372036854775808, %rdx<br class="">        xorq %rdx, %rax<br class="">        ucomiss %xmm1, %xmm0<br class="">        cmovb %rcx, %rax<br class="">        ret<br class=""><br class=""></div><br class=""><div class=""><blockquote type="cite" class=""><div class="">On 19 Apr 2017, at 2:10 PM, Michael Clark <<a href="mailto:michaeljclark@mac.com" class="">michaeljclark@mac.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta http-equiv="Content-Type" content="text/html charset=utf-8" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div class=""><blockquote type="cite" class=""><div class="">On 19 Apr 2017, at 1:14 PM, Tim Northover <<a href="mailto:t.p.northover@gmail.com" class="">t.p.northover@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div class="">On 18 April 2017 at 15:54, Michael Clark via cfe-dev<br class=""><<a href="mailto:cfe-dev@lists.llvm.org" class="">cfe-dev@lists.llvm.org</a>> wrote:<br class=""><blockquote type="cite" class="">The only way towards completing a milestone is via fixing a number of small issues along<br class="">the way…<br class=""></blockquote><br class="">I believe there's more to it than that. None of LLVM's optimizations<br class="">are aware of this extra side-channel of information (with possible<br class="">exceptions like avoiding speculating fdiv because of unavoidable<br class="">exceptions).<br class=""><br class="">From what I remember, the real proposal is to replace all<br class="">floating-point IR with intrinsics when FENV_ACCESS is on, which the<br class="">optimizers by default won't have a clue about and will treat<br class="">conservatively (essentially like they're modifying external memory).<br class=""><br class="">So be careful with drawing conclusions from small snippets; you're<br class="">probably not seeing the full range of LLVM's behaviour.<br class=""></div></div></blockquote></div><br class=""><div class=""><br class=""></div><div class=""><div class="">Yes. I’m sure.</div><div class=""><br class=""></div><div class="">It reproduces with just the cast on its own: <a href="https://godbolt.org/g/myUoL2" class="">https://godbolt.org/g/myUoL2</a></div><div class=""><br class=""></div><div class="">It appears to be in the LLVM lowering of the fptoui intrinsic so it must MC layer optimisations.</div><div class=""><br class=""></div><blockquote class="" style="margin: 0px 0px 0px 40px; border: none; padding: 0px;"><div class=""><font face="Courier" class="">; Function Attrs: noinline nounwind uwtable</font></div><div class=""><font face="Courier" class="">define i64 @_Z7fcvt_luf(float %f) #0 {</font></div><div class=""><font face="Courier" class="">  %1 = alloca float, align 4</font></div><div class=""><font face="Courier" class="">  store float %f, float* %1, align 4</font></div><div class=""><font face="Courier" class="">  %2 = load float, float* %1, align 4</font></div><div class=""><font face="Courier" class="">  %3 = fptoui float %2 to i64</font></div><div class=""><font face="Courier" class="">  ret i64 %3</font></div><div class=""><font face="Courier" class="">}</font></div></blockquote><div class=""><br class=""></div><div class="">GCC performs a comparison with ucomiss and branches whereas Clang computes both forms and predicates the result using a conditional move. One of the conversions obviously is setting the INEXACT MXCSR flag.</div><div class=""><br class=""></div><div class="">Clang lowering (inexact set when result is exact):</div><div class=""><br class=""></div><blockquote class="" style="margin: 0px 0px 0px 40px; border: none; padding: 0px;"><font face="Courier" class="">fcvt_lu(float):<br class="">        movss   xmm1, dword ptr [rip + .LCPI1_0] # xmm1 = mem[0],zero,zero,zero<br class="">        movaps  xmm2, xmm0<br class="">        subss   xmm2, xmm1<br class="">        cvttss2si       rax, xmm2<br class="">        movabs  rcx, -9223372036854775808<br class="">        xor     rcx, rax<br class="">        cvttss2si       rax, xmm0<br class="">        ucomiss xmm0, xmm1<br class="">        cmovae  rax, rcx<br class="">        ret</font></blockquote><div class=""><br class=""></div><div class="">GCC lowering (sets flags correctly):</div><div class=""><br class=""></div><blockquote class="" style="margin: 0px 0px 0px 40px; border: none; padding: 0px;"><font face="Courier" class="">fcvt_lu(float):<br class="">        ucomiss xmm0, DWORD PTR .LC0[rip]<br class="">        jnb     .L4<br class="">        cvttss2si       rax, xmm0<br class="">        ret<br class="">.L4:<br class="">        subss   xmm0, DWORD PTR .LC0[rip]<br class="">        movabs  rdx, -9223372036854775808<br class="">        cvttss2si       rax, xmm0<br class="">        xor     rax, rdx<br class="">        ret</font></blockquote></div></div></div></blockquote></div><br class=""></div></div></blockquote></div><br class=""></div></div></blockquote></div><br class=""></div></div></div></body></html>