<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class="">Hi,</div><div class=""><br class=""></div><div class="">I’ve reproduced my original issue. This issue is FE_INEXACT set for an exact conversion from float to unsigned long long.</div><div class=""><br class=""></div><div class="">The prior issue was eager inlining and constant folding causing missing updates to the floating point accrued exception flags when optimisation was enabled.</div><div class=""><br class=""></div><div class="">This second issue appears not to be an eager optimisation or constant folding issue.</div><div class=""><br class=""></div><div class="">- float to unsigned int conversion appears to be okay. </div><div class="">- float to unsigned long long conversion appears to incorrectly update the accrued exception flags. </div><div class=""><br class=""></div><div class="">Note the code explicitly casts from float to unsigned and then to signed. The first cast is to select float conversion to unsigned, and the outer cast is a sign extension indicator as all RISC-V integers are canonically sign extended to the width of the widest type (unlike x86). Returning a signed type of a smaller width will automatically sign extend when assigned to a larger signed type (the code came from a template) which is why we have extra casts. While the sign extension is redundant on 64-bit it isn’t for u128 and s128 which we intend to support.</div><div class=""><br class=""></div><div class="">- <a href="https://godbolt.org/g/kvSm5J" class="">https://godbolt.org/g/kvSm5J</a></div><div class=""><br class=""></div><div class="">Any insight would be greatly appreciated.</div><div class=""><br class=""></div><div class="">Michael.</div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">$ g++ -O3 -lm <a href="http://fcvt.cc" class="">fcvt.cc</a> <br class="">$ ./a.out <br class="">1 exact<br class="">1 inexact<br class="">1 exact<br class="">1 inexact<br class=""><br class=""></div><div class=""><br class=""></div><div class="">$ clang++ -O3 -lm <a href="http://fcvt.cc" class="">fcvt.cc</a> <br class="">$ ./a.out <br class="">1 exact<br class="">1 inexact<br class="">1 inexact<br class="">1 inexact<br class=""><br class=""></div><div class=""><br class=""></div><div class="">$ cat <a href="http://fcvt.cc" class="">fcvt.cc</a></div><div class="">#include <cstdio><br class="">#include <cmath><br class="">#include <cfenv><br class="">#include <limits><br class=""><br class="">typedef signed int         s32;<br class="">typedef unsigned int       u32;<br class="">typedef signed long long   s64;<br class="">typedef unsigned long long u64;<br class=""><br class="">__attribute__ ((noinline)) s32 fcvt_wu(float f)<br class="">{<br class=""><span class="Apple-tab-span" style="white-space:pre">    </span>return (std::isnan(f) | ((f >= 0) & std::isinf(f)))<br class=""><span class="Apple-tab-span" style="white-space:pre">             </span>? std::numeric_limits<u32>::max()<br class=""><span class="Apple-tab-span" style="white-space:pre">                </span>: s32(u32(f));<br class="">}<br class=""><br class="">__attribute__ ((noinline)) s64 fcvt_lu(float f)<br class="">{<br class=""><span class="Apple-tab-span" style="white-space:pre">    </span>return (std::isnan(f) | ((f >= 0) & std::isinf(f)))<br class=""><span class="Apple-tab-span" style="white-space:pre">             </span>? std::numeric_limits<u64>::max()<br class=""><span class="Apple-tab-span" style="white-space:pre">                </span>: s64(u64(f));<br class="">}<br class=""><br class="">void test_fcvt_wu(float a)<br class="">{<br class=""><span class="Apple-tab-span" style="white-space:pre"> </span>feclearexcept(FE_ALL_EXCEPT);<br class=""><span class="Apple-tab-span" style="white-space:pre">  </span>printf("%d ", fcvt_wu(a));<br class=""><span class="Apple-tab-span" style="white-space:pre">   </span>printf("%s\n", fetestexcept(FE_INEXACT) ? "inexact" : "exact");<br class="">}<br class=""><br class="">void test_fcvt_lu(float a)<br class="">{       <br class=""><span class="Apple-tab-span" style="white-space:pre">       </span>feclearexcept(FE_ALL_EXCEPT);<br class=""><span class="Apple-tab-span" style="white-space:pre">  </span>printf("%lld ", fcvt_lu(a));<br class=""><span class="Apple-tab-span" style="white-space:pre"> </span>printf("%s\n", fetestexcept(FE_INEXACT) ? "inexact" : "exact");<br class="">}<br class=""><br class="">int main()<br class="">{<br class=""><span class="Apple-tab-span" style="white-space:pre">  </span>fesetround(FE_TONEAREST);<br class=""><br class=""><span class="Apple-tab-span" style="white-space:pre"> </span>test_fcvt_wu(1.0f);<br class=""><span class="Apple-tab-span" style="white-space:pre">    </span>test_fcvt_wu(1.1f);<br class=""><span class="Apple-tab-span" style="white-space:pre">    </span>test_fcvt_lu(1.0f);<br class=""><span class="Apple-tab-span" style="white-space:pre">    </span>test_fcvt_lu(1.1f);<br class="">}<br class=""><br class=""></div><br class=""><div><blockquote type="cite" class=""><div class="">On 18 Apr 2017, at 10:51 AM, Michael Clark <<a href="mailto:michaeljclark@mac.com" class="">michaeljclark@mac.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta http-equiv="Content-Type" content="text/html charset=utf-8" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div class=""><blockquote type="cite" class=""><div class="">On 18 Apr 2017, at 1:08 AM, Stephen Canon <<a href="mailto:scanon@apple.com" class="">scanon@apple.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div class="">Hi Michael —<br class=""><br class="">You’re dancing around a real issue in clang (and most other compilers), but it’s camouflaged by a few issues in your code. I’ll address those first:<br class=""><br class="">1. If you want to read or set the floating-point environment, your code must contain:<br class=""><br class=""><span class="Apple-tab-span" style="white-space:pre">     </span>#pragma STDC FENV_ACCESS ON<br class=""></div></div></blockquote><div class=""><br class=""></div><div class="">Yes, I tried that first and got the warning.</div><br class=""><blockquote type="cite" class=""><div class=""><div class="">If you do not have this pragma, all bets are off. The compiler is free to re-arrange your calls to fe* functions, treat the floating-point environment as constant, or eliminate them all together. See §7.6.1 of the C standard for more details, in particular, the following sentence:<br class=""><br class=""><blockquote type="cite" class="">If part of a program tests floating-point status flags, sets floating-point control modes, or runs under non-default mode settings, but was translated with the state for the FENV_ACCESS pragma ‘‘off’’, the behavior is undefined.<br class=""></blockquote><br class=""><br class="">If you add this pragma to your code example, you’ll get a helpful warning from clang that FENV_ACCESS is not [yet] supported.<br class=""></div></div></blockquote><div class=""><br class=""></div><div class="">Interesting. I’m sure the scientific computing folk will be interested in having this working. Many IEEE-754 compliant ISAs support floating point accrued exceptions. In fact I am working on a RISC-V simulator and binary translator so ultimately the C code will be translated to x86_64 asm and I’ll read MXCSR directly however I’m currently reversing the compiler asm output for the (working) conversions. I wanted the C cast based conversions to work reliably on gcc and clang for a reference interpreter that I am using to test a binary translating JIT engine.</div><br class=""><blockquote type="cite" class=""><div class=""><div class="">2. Also in §7.6, you will note the following sentence (third bullet in paragraph 3):<br class=""><br class=""><blockquote type="cite" class="">a function call is assumed to have the potential for raising floating-point exceptions, unless its documentation promises otherwise.<br class=""></blockquote><br class="">In particular, your code calls `printf` between `feclearexcept` and `fetestexcept`. To the best of my recollection, `printf` is not documented as not modifying the floating-point environment, so once you call it, all bets are off w.r.t. the floating-point state, even if you set FENV_ACCESS ON.<br class=""></div></div></blockquote><div class=""><br class=""></div><div class="">I can modify the test to fetch the exception before the printf but I don’t believe it will make any difference as I am only printing an integer not a double. In the code where the problem exists, I explicitly save and restore the floating point accrued exception state in logging routines as I’ve already encountered the issue where printf with a double stomps on the floating point accrued exception state. I’ve in fact ported gdtoa and friends to C++ from FreeBSD’s libc. However, in this case I am only printing integers so it should have no effect on the floating point accrued exception state.</div><div class=""><br class=""></div><div class="">Indeed. I have a variadic template formatter replacement for snprintf that does not use varargs. It is derived from FreeBSD’s snprintf and David M Gay’s gdtoa. It has been updated to type box arguments using a variadic template wrapper. It emits a fixed size stack frame and it buffers in std::string  <<a href="https://github.com/michaeljclark/c-fmt/" class="">https://github.com/michaeljclark/c-fmt/</a>>. It relies on the wrapper being inlined. Note: the code is missing extern inline and I’ve since moved part of the implementation from headers into compiled modules but have not yet updated c+fmt.</div><div class=""><br class=""></div><div class="">As an aside, a C++2n string formatter that does not depend on iostream/stringstream would be a nice addition to the standard. A familiar snprintf style interface using format strings, but without all of the buffer woes. It also needs to support formatting QP (Quad Precision) so I intend to update gdtoa to a template that is parameterised for variable exponent and significand using type information structs:</div><div class=""><br class=""></div><div class=""><a href="https://github.com/michaeljclark/riscv-meta/blob/07d3af92b235b0e366c5af76ff65805c49812392/src/asm/fpu.h#L46-L110" class="">https://github.com/michaeljclark/riscv-meta/blob/07d3af92b235b0e366c5af76ff65805c49812392/src/asm/fpu.h#L46-L110</a></div><br class=""><blockquote type="cite" class=""><div class=""><div class="">OK, now the real issue in clang: it doesn’t [yet] support FENV_ACCESS. Neither does GCC. There’s been some motion recently toward adding support for FENV_ACCESS, but it’s a largish project, and it hasn’t happened yet. Both compilers, when optimization is enabled, simply replace your call to fcvt(1.1) with 1 (because they don’t support FENV_ACCESS). GCC happens to “work” in your second example because it inlines `fcvt` into `test_fcvt`, but doesn’t inline `test_fcvt` into `main`, clang inlines both, does constant propagation, and no flags are raised.<br class=""></div></div></blockquote><div class=""><br class=""></div><div class="">I knew it was inlining which is why I moved the code to an (default visibility extern) function which gcc seems to handle and I have been dumping asm output from both of the compilers. It would be interesting if there was a mode where default visibility extern functions where not inlined unless they were declared extern inline. I can understand static functions or template instantiation being inlined, but default visibility extern is a different issue. gcc seems to be more conservative with “non static" functions.</div><div class=""><br class=""></div><blockquote type="cite" class=""><div class=""><div class=""><a href="http://godbolt.org/" class="">godbolt.org</a> is a good resource to see what’s going on here, though it won’t tell you *why*:<br class=""><a href="https://godbolt.org/g/Zb8Eoc" class="">https://godbolt.org/g/Zb8Eoc</a><br class=""></div></div></blockquote><div class=""><br class=""></div><div class="">Yes Matt Godbolt’s tools is very useful. I use objdump (and otool -tV on macos) a lot too, but I thought there might be a compiler flag for conservative handling of floating point to retain floating point accrued exceptions. I was unaware of the level of support for floating point accrued exceptions. I’ve added __attribute__ ((noinline)) to the second version and it now works with -O3. There should be a flag e.g. -fenv-ieee745 that somehow carries exception state even when inlining or disables inlining for functions that perform conversions or use any operations that require rounding of floating point values.</div><div class=""><br class=""></div><div class="">- <a href="https://godbolt.org/g/PH60E3" class="">https://godbolt.org/g/PH60E3</a></div><div class=""><br class=""></div><div class="">I’ll work on reproducing my original issue (FE_INEXACT for exact conversion) in isolation using __attribute__ ((noinline)) …</div><div class=""><br class=""></div><div class="">Thanks,</div><div class="">Michael.</div><br class=""><blockquote type="cite" class=""><div class=""><div class="">Best,<br class="">– Steve<br class=""><br class=""><blockquote type="cite" class="">On Apr 15, 2017, at 5:51 PM, Michael Clark via cfe-dev <<a href="mailto:cfe-dev@lists.llvm.org" class="">cfe-dev@lists.llvm.org</a>> wrote:<br class=""><br class="">Hi,<br class=""><br class="">First, apologies if this is not the right place to post.<br class=""><br class="">I am seeing unexpected values in the floating point accrued exception flags with clang generated programs. My original issue is seeing FE_INEXACT after an exact float to unsigned int conversion within a ternary expression. This issue does not occur with gcc. In trying to isolate the problem I wrote a simple test program, which results in completely opposite behaviour. FE_INEXACT is not getting set for an inexact conversion when optimisation is enabled. <br class=""><br class="">Given I’m not yet seeing predictable results for accrued exception flags, I gave up trying to reproduce my original issue (FE_INEXACT for exact conversion) until I am certain which floating point optimisations are being enabled, and under what conditions floating point accrued exceptions are optimised away, otherwise I can’t be sure to isolate my first problem.<br class=""><br class="">I have two versions of a simple test program below, one which even returns incorrect results in gcc. The tests below run on Linux using Debian vendor build of clang 3.8.1 and on macos with the Xcode 8.3.1 vendor build of clang. I don’t have -fast-math enabled so I would expect standards compliant behaviour. I would like to know what optimisations are preventing floating point accrued exceptions from being set and how to disable these optimisation so that I am get deterministic results, then I can try to reproduce my first issue in isolation.<br class=""><br class="">- fcvt1.c triggers the same issue with gcc (FE_INEXACT not set for inexact conversion)<br class="">- fcvt2.c triggers the issue only with clang (FE_INEXACT not set for inexact conversion)<br class="">- no reproducer yet… (FE_INEXACT set after exact conversion)<br class=""><br class="">Happy Holidays,<br class=""><br class="">Michael.<br class=""><br class="">$ gcc --version<br class="">gcc (Debian 6.3.0-6) 6.3.0 20170205<br class="">Copyright (C) 2016 Free Software Foundation, Inc.<br class="">This is free software; see the source for copying conditions.  There is NO<br class="">warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.<br class=""><br class="">$ gcc -O0 -lm fcvt1.c <br class="">$ ./a.out <br class="">1 exact<br class="">1 inexact<br class="">$ gcc -O3 -lm fcvt1.c <br class="">$ ./a.out <br class="">1 exact<br class="">1 exact<br class="">$ gcc -O0 -lm fcvt2.c <br class="">$ ./a.out <br class="">1 exact<br class="">1 inexact<br class="">$ gcc -O3 -lm fcvt2.c <br class="">$ ./a.out <br class="">1 exact<br class="">1 inexact<br class=""><br class="">$ clang --version<br class="">clang version 3.8.1-16 (tags/RELEASE_381/final)<br class="">Target: x86_64-pc-linux-gnu<br class="">Thread model: posix<br class="">InstalledDir: /usr/bin<br class=""><br class="">$ clang -O0 -lm fcvt1.c <br class="">$ ./a.out <br class="">1 exact<br class="">1 inexact<br class="">$ clang -O3 -lm fcvt1.c <br class="">$ ./a.out <br class="">1 exact<br class="">1 exact<br class="">$ clang -O0 -lm fcvt2.c <br class="">$ ./a.out <br class="">1 exact<br class="">1 inexact<br class="">$ clang -O3 -lm fcvt2.c <br class="">$ ./a.out <br class="">1 exact<br class="">1 exact<br class=""><br class="">$ clang --version<br class="">Apple LLVM version 8.1.0 (clang-802.0.41)<br class="">Target: x86_64-apple-darwin16.5.0<br class="">Thread model: posix<br class="">InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin<br class=""><br class="">$ cc -O0 fcvt1.c <br class="">$ ./a.out <br class="">1 exact<br class="">1 inexact<br class="">$ cc -O3 fcvt1.c <br class="">$ ./a.out <br class="">1 exact<br class="">1 exact<br class="">$ cc -O0 fcvt2.c <br class="">$ ./a.out <br class="">1 exact<br class="">1 inexact<br class="">$ cc -O3 fcvt2.c <br class="">$ ./a.out <br class="">1 exact<br class="">1 exact<br class=""><br class=""><br class="">$ cat fcvt1.c <br class="">#include <stdio.h><br class="">#include <fenv.h><br class=""><br class="">unsigned fcvt(float a)<br class="">{<br class="">       return (unsigned)a;<br class="">}<br class=""><br class="">int main()<br class="">{<br class="">       fesetround(FE_TONEAREST);<br class=""><br class="">       feclearexcept(FE_ALL_EXCEPT);<br class="">       printf("%d ", fcvt(1.0f));<br class="">       printf("%s\n", fetestexcept(FE_INEXACT) ? "inexact" : "exact");<br class=""><br class="">       feclearexcept(FE_ALL_EXCEPT);<br class="">       printf("%d ", fcvt(1.1f));<br class="">       printf("%s\n", fetestexcept(FE_INEXACT) ? "inexact" : "exact");<br class="">}<br class=""><br class=""><br class="">$ cat fcvt2.c<br class="">#include <stdio.h><br class="">#include <fenv.h><br class=""><br class="">unsigned fcvt(float a)<br class="">{<br class="">       return (unsigned)a;<br class="">}<br class=""><br class="">void test_fcvt(float a)<br class="">{<br class="">       feclearexcept(FE_ALL_EXCEPT);<br class="">       printf("%d ", fcvt(a));<br class="">       printf("%s\n", fetestexcept(FE_INEXACT) ? "inexact" : "exact");<br class="">}<br class=""><br class="">int main()<br class="">{<br class="">       fesetround(FE_TONEAREST);<br class=""><br class="">       test_fcvt(1.0f);<br class="">       test_fcvt(1.1f);<br class="">}<br class=""><br class="">_______________________________________________<br class="">cfe-dev mailing list<br class=""><a href="mailto:cfe-dev@lists.llvm.org" class="">cfe-dev@lists.llvm.org</a><br class=""><a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" class="">http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a><br class=""></blockquote><br class=""></div></div></blockquote></div><br class=""></div></div></blockquote></div><br class=""></body></html>