<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div><blockquote type="cite" class=""><div class="">On 19 Apr 2017, at 1:14 PM, Tim Northover <<a href="mailto:t.p.northover@gmail.com" class="">t.p.northover@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div class="">On 18 April 2017 at 15:54, Michael Clark via cfe-dev<br class=""><<a href="mailto:cfe-dev@lists.llvm.org" class="">cfe-dev@lists.llvm.org</a>> wrote:<br class=""><blockquote type="cite" class="">The only way towards completing a milestone is via fixing a number of small issues along<br class="">the way…<br class=""></blockquote><br class="">I believe there's more to it than that. None of LLVM's optimizations<br class="">are aware of this extra side-channel of information (with possible<br class="">exceptions like avoiding speculating fdiv because of unavoidable<br class="">exceptions).<br class=""><br class="">From what I remember, the real proposal is to replace all<br class="">floating-point IR with intrinsics when FENV_ACCESS is on, which the<br class="">optimizers by default won't have a clue about and will treat<br class="">conservatively (essentially like they're modifying external memory).<br class=""><br class="">So be careful with drawing conclusions from small snippets; you're<br class="">probably not seeing the full range of LLVM's behaviour.<br class=""></div></div></blockquote></div><br class=""><div class=""><br class=""></div><div class=""><div class="">Yes. I’m sure.</div><div class=""><br class=""></div><div class="">It reproduces with just the cast on its own: <a href="https://godbolt.org/g/myUoL2" class="">https://godbolt.org/g/myUoL2</a></div><div class=""><br class=""></div><div class="">It appears to be in the LLVM lowering of the fptoui intrinsic so it must MC layer optimisations.</div><div class=""><br class=""></div><blockquote class="" style="margin: 0px 0px 0px 40px; border: none; padding: 0px;"><div class=""><font face="Courier" class="">; Function Attrs: noinline nounwind uwtable</font></div><div class=""><font face="Courier" class="">define i64 @_Z7fcvt_luf(float %f) #0 {</font></div><div class=""><font face="Courier" class=""> %1 = alloca float, align 4</font></div><div class=""><font face="Courier" class=""> store float %f, float* %1, align 4</font></div><div class=""><font face="Courier" class=""> %2 = load float, float* %1, align 4</font></div><div class=""><font face="Courier" class=""> %3 = fptoui float %2 to i64</font></div><div class=""><font face="Courier" class=""> ret i64 %3</font></div><div class=""><font face="Courier" class="">}</font></div></blockquote><div class=""><br class=""></div><div class="">GCC performs a comparison with ucomiss and branches whereas Clang computes both forms and predicates the result using a conditional move. One of the conversions obviously is setting the INEXACT MXCSR flag.</div><div class=""><br class=""></div><div class="">Clang lowering (inexact set when result is exact):</div><div class=""><br class=""></div><blockquote class="" style="margin: 0px 0px 0px 40px; border: none; padding: 0px;"><font face="Courier" class="">fcvt_lu(float):<br class=""> movss xmm1, dword ptr [rip + .LCPI1_0] # xmm1 = mem[0],zero,zero,zero<br class=""> movaps xmm2, xmm0<br class=""> subss xmm2, xmm1<br class=""> cvttss2si rax, xmm2<br class=""> movabs rcx, -9223372036854775808<br class=""> xor rcx, rax<br class=""> cvttss2si rax, xmm0<br class=""> ucomiss xmm0, xmm1<br class=""> cmovae rax, rcx<br class=""> ret</font></blockquote><div class=""><br class=""></div><div class="">GCC lowering (sets flags correctly):</div><div class=""><br class=""></div><blockquote class="" style="margin: 0px 0px 0px 40px; border: none; padding: 0px;"><font face="Courier" class="">fcvt_lu(float):<br class=""> ucomiss xmm0, DWORD PTR .LC0[rip]<br class=""> jnb .L4<br class=""> cvttss2si rax, xmm0<br class=""> ret<br class="">.L4:<br class=""> subss xmm0, DWORD PTR .LC0[rip]<br class=""> movabs rdx, -9223372036854775808<br class=""> cvttss2si rax, xmm0<br class=""> xor rax, rdx<br class=""> ret</font></blockquote></div></body></html>