<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Forgot to reply all.<div class=""><br class=""><div><blockquote type="cite" class=""><div class="">On 19 Apr 2017, at 10:03 AM, Michael Clark <<a href="mailto:michaeljclark@mac.com" class="">michaeljclark@mac.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div class=""><br class=""><blockquote type="cite" class="">On 19 Apr 2017, at 9:52 AM, Stephen Canon <<a href="mailto:scanon@apple.com" class="">scanon@apple.com</a>> wrote:<br class=""><br class="">You’re hitting <a href="https://bugs.llvm.org/show_bug.cgi?id=17686" class="">https://bugs.llvm.org/show_bug.cgi?id=17686</a>.<br class=""><br class="">Which is actually precisely the same “compiler does not model FENV_ACCESS” bug, just in the compiler’s built-in lowering for fp-to-unsigned conversion instead of in your code (because x86—pre AVX-512F—does not have a native float-to-unsigned conversion).<br class=""></blockquote></div></div></blockquote><div><br class=""></div><div>I’m aware.</div><div><br class=""></div><div><a href="https://github.com/michaeljclark/riscv-meta/blob/ea306062bfd2f60a229daf6b04826cdeb2dfbe9d/meta/opcode-asm-i786#L155-L158" class="">https://github.com/michaeljclark/riscv-meta/blob/ea306062bfd2f60a229daf6b04826cdeb2dfbe9d/meta/opcode-asm-i786#L155-L158</a></div><br class=""><blockquote type="cite" class=""><div class=""><div class=""><blockquote type="cite" class="">The real fix for all of these issues is to implement FENV_ACCESS.<br class=""></blockquote><br class="">I think I’ll need inline asm then.<br class=""><br class=""><blockquote type="cite" class="">FWIW the "std::isnan(f) | ((f >= 0) & std::isinf(f))) ? std::numeric_limits<u64>::max()” dance in the rest of your conversion gives me pause; what are you trying to do? It’s pretty odd to clamp nan and inf to u32::max but leave the result for all values between UINT64_MAX + 1 and infinity undefined.<br class=""></blockquote><br class="">The defined behaviour for RISC-V is to convert NaN and positive infinity to UINT_MAX, while the remainder is already handled within the behaviour of the intrinsic conversion i.e. the signed positive wraps around to produce the values between INT_MAX and UINT_MAX. It could be tightened up a little. Values below -1 are clamped to 0 (-1 can round up to 0). It’s actually a subset of the whole expression which I trimmed for the test case. It’s so that the conversion passes the RISC-V compliance test suite which has different defined behaviour, whereas the behaviour in C may be undefined.<br class=""><br class="">It’s going to be asm anyway… as we are writing a JIT.<br class=""><br class=""><blockquote type="cite" class="">– Steve<br class=""><br class=""><blockquote type="cite" class="">On Apr 18, 2017, at 4:56 PM, Michael Clark <<a href="mailto:michaeljclark@mac.com" class="">michaeljclark@mac.com</a>> wrote:<br class=""><br class="">Hi,<br class=""><br class="">I’ve reproduced my original issue. This issue is FE_INEXACT set for an exact conversion from float to unsigned long long.<br class=""><br class="">The prior issue was eager inlining and constant folding causing missing updates to the floating point accrued exception flags when optimisation was enabled.<br class=""><br class="">This second issue appears not to be an eager optimisation or constant folding issue.<br class=""><br class="">- float to unsigned int conversion appears to be okay. <br class="">- float to unsigned long long conversion appears to incorrectly update the accrued exception flags. <br class=""><br class="">Note the code explicitly casts from float to unsigned and then to signed. The first cast is to select float conversion to unsigned, and the outer cast is a sign extension indicator as all RISC-V integers are canonically sign extended to the width of the widest type (unlike x86). Returning a signed type of a smaller width will automatically sign extend when assigned to a larger signed type (the code came from a template) which is why we have extra casts. While the sign extension is redundant on 64-bit it isn’t for u128 and s128 which we intend to support.<br class=""><br class="">- <a href="https://godbolt.org/g/kvSm5J" class="">https://godbolt.org/g/kvSm5J</a><br class=""><br class="">Any insight would be greatly appreciated.<br class=""><br class="">Michael.<br class=""><br class=""><br class="">$ g++ -O3 -lm <a href="http://fcvt.cc" class="">fcvt.cc</a> <br class="">$ ./a.out <br class="">1 exact<br class="">1 inexact<br class="">1 exact<br class="">1 inexact<br class=""><br class=""><br class="">$ clang++ -O3 -lm <a href="http://fcvt.cc" class="">fcvt.cc</a> <br class="">$ ./a.out <br class="">1 exact<br class="">1 inexact<br class="">1 inexact<br class="">1 inexact<br class=""><br class=""><br class="">$ cat <a href="http://fcvt.cc" class="">fcvt.cc</a><br class="">#include <cstdio><br class="">#include <cmath><br class="">#include <cfenv><br class="">#include <limits><br class=""><br class="">typedef signed int         s32;<br class="">typedef unsigned int       u32;<br class="">typedef signed long long   s64;<br class="">typedef unsigned long long u64;<br class=""><br class="">__attribute__ ((noinline)) s32 fcvt_wu(float f)<br class="">{<br class=""><span class="Apple-tab-span" style="white-space:pre">   </span>return (std::isnan(f) | ((f >= 0) & std::isinf(f)))<br class=""><span class="Apple-tab-span" style="white-space:pre">     </span><span class="Apple-tab-span" style="white-space:pre">    </span>? std::numeric_limits<u32>::max()<br class=""><span class="Apple-tab-span" style="white-space:pre">        </span><span class="Apple-tab-span" style="white-space:pre">    </span>: s32(u32(f));<br class="">}<br class=""><br class="">__attribute__ ((noinline)) s64 fcvt_lu(float f)<br class="">{<br class=""><span class="Apple-tab-span" style="white-space:pre">    </span>return (std::isnan(f) | ((f >= 0) & std::isinf(f)))<br class=""><span class="Apple-tab-span" style="white-space:pre">     </span><span class="Apple-tab-span" style="white-space:pre">    </span>? std::numeric_limits<u64>::max()<br class=""><span class="Apple-tab-span" style="white-space:pre">        </span><span class="Apple-tab-span" style="white-space:pre">    </span>: s64(u64(f));<br class="">}<br class=""><br class="">void test_fcvt_wu(float a)<br class="">{<br class=""><span class="Apple-tab-span" style="white-space:pre"> </span>feclearexcept(FE_ALL_EXCEPT);<br class=""><span class="Apple-tab-span" style="white-space:pre">  </span>printf("%d ", fcvt_wu(a));<br class=""><span class="Apple-tab-span" style="white-space:pre">   </span>printf("%s\n", fetestexcept(FE_INEXACT) ? "inexact" : "exact");<br class="">}<br class=""><br class="">void test_fcvt_lu(float a)<br class="">{       <br class=""><span class="Apple-tab-span" style="white-space:pre">     </span>feclearexcept(FE_ALL_EXCEPT);<br class=""><span class="Apple-tab-span" style="white-space:pre">  </span>printf("%lld ", fcvt_lu(a));<br class=""><span class="Apple-tab-span" style="white-space:pre"> </span>printf("%s\n", fetestexcept(FE_INEXACT) ? "inexact" : "exact");<br class="">}<br class=""><br class="">int main()<br class="">{<br class=""><span class="Apple-tab-span" style="white-space:pre">  </span>fesetround(FE_TONEAREST);<br class=""><br class=""><span class="Apple-tab-span" style="white-space:pre"> </span>test_fcvt_wu(1.0f);<br class=""><span class="Apple-tab-span" style="white-space:pre">    </span>test_fcvt_wu(1.1f);<br class=""><span class="Apple-tab-span" style="white-space:pre">    </span>test_fcvt_lu(1.0f);<br class=""><span class="Apple-tab-span" style="white-space:pre">    </span>test_fcvt_lu(1.1f);<br class="">}<br class=""><br class=""><br class=""><blockquote type="cite" class="">On 18 Apr 2017, at 10:51 AM, Michael Clark <<a href="mailto:michaeljclark@mac.com" class="">michaeljclark@mac.com</a>> wrote:<br class=""><br class=""><br class=""><blockquote type="cite" class="">On 18 Apr 2017, at 1:08 AM, Stephen Canon <<a href="mailto:scanon@apple.com" class="">scanon@apple.com</a>> wrote:<br class=""><br class="">Hi Michael —<br class=""><br class="">You’re dancing around a real issue in clang (and most other compilers), but it’s camouflaged by a few issues in your code. I’ll address those first:<br class=""><br class="">1. If you want to read or set the floating-point environment, your code must contain:<br class=""><br class=""><span class="Apple-tab-span" style="white-space:pre">       </span>#pragma STDC FENV_ACCESS ON<br class=""></blockquote><br class="">Yes, I tried that first and got the warning.<br class=""><br class=""><blockquote type="cite" class="">If you do not have this pragma, all bets are off. The compiler is free to re-arrange your calls to fe* functions, treat the floating-point environment as constant, or eliminate them all together. See §7.6.1 of the C standard for more details, in particular, the following sentence:<br class=""><br class=""><blockquote type="cite" class="">If part of a program tests floating-point status flags, sets floating-point control modes, or runs under non-default mode settings, but was translated with the state for the FENV_ACCESS pragma ‘‘off’’, the behavior is undefined.<br class=""></blockquote><br class=""><br class="">If you add this pragma to your code example, you’ll get a helpful warning from clang that FENV_ACCESS is not [yet] supported.<br class=""></blockquote><br class="">Interesting. I’m sure the scientific computing folk will be interested in having this working. Many IEEE-754 compliant ISAs support floating point accrued exceptions. In fact I am working on a RISC-V simulator and binary translator so ultimately the C code will be translated to x86_64 asm and I’ll read MXCSR directly however I’m currently reversing the compiler asm output for the (working) conversions. I wanted the C cast based conversions to work reliably on gcc and clang for a reference interpreter that I am using to test a binary translating JIT engine.<br class=""><br class=""><blockquote type="cite" class="">2. Also in §7.6, you will note the following sentence (third bullet in paragraph 3):<br class=""><br class=""><blockquote type="cite" class="">a function call is assumed to have the potential for raising floating-point exceptions, unless its documentation promises otherwise.<br class=""></blockquote><br class="">In particular, your code calls `printf` between `feclearexcept` and `fetestexcept`. To the best of my recollection, `printf` is not documented as not modifying the floating-point environment, so once you call it, all bets are off w.r.t. the floating-point state, even if you set FENV_ACCESS ON.<br class=""></blockquote><br class="">I can modify the test to fetch the exception before the printf but I don’t believe it will make any difference as I am only printing an integer not a double. In the code where the problem exists, I explicitly save and restore the floating point accrued exception state in logging routines as I’ve already encountered the issue where printf with a double stomps on the floating point accrued exception state. I’ve in fact ported gdtoa and friends to C++ from FreeBSD’s libc. However, in this case I am only printing integers so it should have no effect on the floating point accrued exception state.<br class=""><br class="">Indeed. I have a variadic template formatter replacement for snprintf that does not use varargs. It is derived from FreeBSD’s snprintf and David M Gay’s gdtoa. It has been updated to type box arguments using a variadic template wrapper. It emits a fixed size stack frame and it buffers in std::string  <<a href="https://github.com/michaeljclark/c-fmt/" class="">https://github.com/michaeljclark/c-fmt/</a>>. It relies on the wrapper being inlined. Note: the code is missing extern inline and I’ve since moved part of the implementation from headers into compiled modules but have not yet updated c+fmt.<br class=""><br class="">As an aside, a C++2n string formatter that does not depend on iostream/stringstream would be a nice addition to the standard. A familiar snprintf style interface using format strings, but without all of the buffer woes. It also needs to support formatting QP (Quad Precision) so I intend to update gdtoa to a template that is parameterised for variable exponent and significand using type information structs:<br class=""><br class=""><a href="https://github.com/michaeljclark/riscv-meta/blob/07d3af92b235b0e366c5af76ff65805c49812392/src/asm/fpu.h#L46-L110" class="">https://github.com/michaeljclark/riscv-meta/blob/07d3af92b235b0e366c5af76ff65805c49812392/src/asm/fpu.h#L46-L110</a><br class=""><br class=""><blockquote type="cite" class="">OK, now the real issue in clang: it doesn’t [yet] support FENV_ACCESS. Neither does GCC. There’s been some motion recently toward adding support for FENV_ACCESS, but it’s a largish project, and it hasn’t happened yet. Both compilers, when optimization is enabled, simply replace your call to fcvt(1.1) with 1 (because they don’t support FENV_ACCESS). GCC happens to “work” in your second example because it inlines `fcvt` into `test_fcvt`, but doesn’t inline `test_fcvt` into `main`, clang inlines both, does constant propagation, and no flags are raised.<br class=""></blockquote><br class="">I knew it was inlining which is why I moved the code to an (default visibility extern) function which gcc seems to handle and I have been dumping asm output from both of the compilers. It would be interesting if there was a mode where default visibility extern functions where not inlined unless they were declared extern inline. I can understand static functions or template instantiation being inlined, but default visibility extern is a different issue. gcc seems to be more conservative with “non static" functions.<br class=""><br class=""><blockquote type="cite" class="">godbolt.org is a good resource to see what’s going on here, though it won’t tell you *why*:<br class="">https://godbolt.org/g/Zb8Eoc<br class=""></blockquote><br class="">Yes Matt Godbolt’s tools is very useful. I use objdump (and otool -tV on macos) a lot too, but I thought there might be a compiler flag for conservative handling of floating point to retain floating point accrued exceptions. I was unaware of the level of support for floating point accrued exceptions. I’ve added __attribute__ ((noinline)) to the second version and it now works with -O3. There should be a flag e.g. -fenv-ieee745 that somehow carries exception state even when inlining or disables inlining for functions that perform conversions or use any operations that require rounding of floating point values.<br class=""><br class="">- https://godbolt.org/g/PH60E3<br class=""><br class="">I’ll work on reproducing my original issue (FE_INEXACT for exact conversion) in isolation using __attribute__ ((noinline)) …<br class=""><br class="">Thanks,<br class="">Michael.<br class=""><br class=""><blockquote type="cite" class="">Best,<br class="">– Steve<br class=""><br class=""><blockquote type="cite" class="">On Apr 15, 2017, at 5:51 PM, Michael Clark via cfe-dev <cfe-dev@lists.llvm.org> wrote:<br class=""><br class="">Hi,<br class=""><br class="">First, apologies if this is not the right place to post.<br class=""><br class="">I am seeing unexpected values in the floating point accrued exception flags with clang generated programs. My original issue is seeing FE_INEXACT after an exact float to unsigned int conversion within a ternary expression. This issue does not occur with gcc. In trying to isolate the problem I wrote a simple test program, which results in completely opposite behaviour. FE_INEXACT is not getting set for an inexact conversion when optimisation is enabled. <br class=""><br class="">Given I’m not yet seeing predictable results for accrued exception flags, I gave up trying to reproduce my original issue (FE_INEXACT for exact conversion) until I am certain which floating point optimisations are being enabled, and under what conditions floating point accrued exceptions are optimised away, otherwise I can’t be sure to isolate my first problem.<br class=""><br class="">I have two versions of a simple test program below, one which even returns incorrect results in gcc. The tests below run on Linux using Debian vendor build of clang 3.8.1 and on macos with the Xcode 8.3.1 vendor build of clang. I don’t have -fast-math enabled so I would expect standards compliant behaviour. I would like to know what optimisations are preventing floating point accrued exceptions from being set and how to disable these optimisation so that I am get deterministic results, then I can try to reproduce my first issue in isolation.<br class=""><br class="">- fcvt1.c triggers the same issue with gcc (FE_INEXACT not set for inexact conversion)<br class="">- fcvt2.c triggers the issue only with clang (FE_INEXACT not set for inexact conversion)<br class="">- no reproducer yet… (FE_INEXACT set after exact conversion)<br class=""><br class="">Happy Holidays,<br class=""><br class="">Michael.<br class=""><br class="">$ gcc --version<br class="">gcc (Debian 6.3.0-6) 6.3.0 20170205<br class="">Copyright (C) 2016 Free Software Foundation, Inc.<br class="">This is free software; see the source for copying conditions.  There is NO<br class="">warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.<br class=""><br class="">$ gcc -O0 -lm fcvt1.c <br class="">$ ./a.out <br class="">1 exact<br class="">1 inexact<br class="">$ gcc -O3 -lm fcvt1.c <br class="">$ ./a.out <br class="">1 exact<br class="">1 exact<br class="">$ gcc -O0 -lm fcvt2.c <br class="">$ ./a.out <br class="">1 exact<br class="">1 inexact<br class="">$ gcc -O3 -lm fcvt2.c <br class="">$ ./a.out <br class="">1 exact<br class="">1 inexact<br class=""><br class="">$ clang --version<br class="">clang version 3.8.1-16 (tags/RELEASE_381/final)<br class="">Target: x86_64-pc-linux-gnu<br class="">Thread model: posix<br class="">InstalledDir: /usr/bin<br class=""><br class="">$ clang -O0 -lm fcvt1.c <br class="">$ ./a.out <br class="">1 exact<br class="">1 inexact<br class="">$ clang -O3 -lm fcvt1.c <br class="">$ ./a.out <br class="">1 exact<br class="">1 exact<br class="">$ clang -O0 -lm fcvt2.c <br class="">$ ./a.out <br class="">1 exact<br class="">1 inexact<br class="">$ clang -O3 -lm fcvt2.c <br class="">$ ./a.out <br class="">1 exact<br class="">1 exact<br class=""><br class="">$ clang --version<br class="">Apple LLVM version 8.1.0 (clang-802.0.41)<br class="">Target: x86_64-apple-darwin16.5.0<br class="">Thread model: posix<br class="">InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin<br class=""><br class="">$ cc -O0 fcvt1.c <br class="">$ ./a.out <br class="">1 exact<br class="">1 inexact<br class="">$ cc -O3 fcvt1.c <br class="">$ ./a.out <br class="">1 exact<br class="">1 exact<br class="">$ cc -O0 fcvt2.c <br class="">$ ./a.out <br class="">1 exact<br class="">1 inexact<br class="">$ cc -O3 fcvt2.c <br class="">$ ./a.out <br class="">1 exact<br class="">1 exact<br class=""><br class=""><br class="">$ cat fcvt1.c <br class="">#include <stdio.h><br class="">#include <fenv.h><br class=""><br class="">unsigned fcvt(float a)<br class="">{<br class="">     return (unsigned)a;<br class="">}<br class=""><br class="">int main()<br class="">{<br class="">     fesetround(FE_TONEAREST);<br class=""><br class="">     feclearexcept(FE_ALL_EXCEPT);<br class="">     printf("%d ", fcvt(1.0f));<br class="">     printf("%s\n", fetestexcept(FE_INEXACT) ? "inexact" : "exact");<br class=""><br class="">     feclearexcept(FE_ALL_EXCEPT);<br class="">     printf("%d ", fcvt(1.1f));<br class="">     printf("%s\n", fetestexcept(FE_INEXACT) ? "inexact" : "exact");<br class="">}<br class=""><br class=""><br class="">$ cat fcvt2.c<br class="">#include <stdio.h><br class="">#include <fenv.h><br class=""><br class="">unsigned fcvt(float a)<br class="">{<br class="">     return (unsigned)a;<br class="">}<br class=""><br class="">void test_fcvt(float a)<br class="">{<br class="">     feclearexcept(FE_ALL_EXCEPT);<br class="">     printf("%d ", fcvt(a));<br class="">     printf("%s\n", fetestexcept(FE_INEXACT) ? "inexact" : "exact");<br class="">}<br class=""><br class="">int main()<br class="">{<br class="">     fesetround(FE_TONEAREST);<br class=""><br class="">     test_fcvt(1.0f);<br class="">     test_fcvt(1.1f);<br class="">}<br class=""><br class="">_______________________________________________<br class="">cfe-dev mailing list<br class="">cfe-dev@lists.llvm.org<br class="">http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev<br class=""></blockquote><br class=""></blockquote><br class=""></blockquote><br class=""></blockquote><br class=""></blockquote><br class=""></div></div></blockquote></div><br class=""></div></body></html>