<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

</head>

<body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">

Thank you very much Sanjay for this analysis.

<div class=""><br class="">

</div>

<div class="">I indeed also think that vcmpe should not be produced for regular fcmp LLVM-IR instructions.</div>

<div class="">Maybe it should be produced for constrained FP compare intrinsics, when those get introduced.</div>

<div class=""><br class="">

</div>

<div class="">I’ve put up a patch for review at <a href="https://reviews.llvm.org/D68463" class="">https://reviews.llvm.org/D68463</a> to no longer produce vcmpe but instead vcmp in the Arm backend.</div>

<div class=""><br class="">

</div>

<div class="">Thanks!</div>

<div class=""><br class="">

</div>

<div class="">Kristof<br class="">

<div><br class="">

<blockquote type="cite" class="">

<div class="">On 1 Oct 2019, at 15:44, Sanjay Patel <<a href="mailto:spatel@rotateright.com" class="">spatel@rotateright.com</a>> wrote:</div>

<br class="Apple-interchange-newline">

<div class="">

<div dir="ltr" class="">

<div class="">Let's change the example to eliminate suspects:</div>

  #include <math.h><br class="">

<div class="">  int is_nan(float x) {</div>

<div class="">    /* </div>

<div class="">      The following subclauses provide macros that are quiet (non floating-point exception raising)<br class="">

</div>

<div class="">      versions of the relational operators, and other comparison macros that facilitate writing<br class="">

</div>

<div class="">      efficient code that accounts for NaNs without suffering the ‘‘invalid’’ floating-point exception.<br class="">

</div>

    */<br class="">

    return isunordered(x, x);<br class="">

  }<br class="">

<div class=""><br class="">

</div>

<div class="">The comment text is from 7.12.14 of the C standard draft. I'm hoping to avoid any scenario under which it is ok to raise an exception in that code (eliminate any questions about the clang front-end behavior / FENV_ACCESS).<br class="">

</div>

<div class=""><br class="">

</div>

<div class="">As IR from clang with no optimization, this becomes a bunch of load/store with:</div>

  %cmp = fcmp uno double %conv, %conv1

<div class=""><br class="">

</div>

<div class="">Ok, so far? "fcmp uno" - <a href="http://llvm.org/docs/LangRef.html#fcmp-instruction" target="_blank" class="">

http://llvm.org/docs/LangRef.html#fcmp-instruction</a> :</div>

uno: yields true if either operand is a QNAN.

<div class=""><br class="">

</div>

<div class="">EarlyCSE/InstCombine reduce that fcmp to:</div>

<div class="">  %cmp = fcmp uno float %x, 0.000000e+00<br class="">

</div>

<div class=""><br class="">

</div>

<div class="">Still good? Same fcmp predicate, but we replaced a repeated use of "%x" with a zero constant to aid optimization.<br class="">

</div>

<div class=""><br class="">

</div>

<div class="">Now, send the optimized IR to codegen:</div>

<div class="">define i32 @is_nan(float %x) {<br class="">

  %cmp = fcmp uno float %x, 0.000000e+00<br class="">

  %r = zext i1 %cmp to i32<br class="">

  ret i32 %r<br class="">

}<br class="">

</div>

<div class=""><br class="">

</div>

<div class="">$ llc -o - fpexception.ll -mtriple=armv7a <br class="">

</div>

<div class="">  vmov s0, r0<br class="">

  mov r0, #0</div>

<div class="">  vcmpe.f32 s0, s0<br class="">

  vmrs APSR_nzcv, fpscr<br class="">

  movwvs r0, #1<br class="">

  bx lr<br class="">

</div>

<div class=""><br class="">

</div>

<div class="">We produced "vcmpe" for code that should never cause an FP exception. ARM codegen bug?<br class="">

</div>

</div>

<br class="">

<div class="gmail_quote">

<div dir="ltr" class="gmail_attr">On Tue, Oct 1, 2019 at 5:45 AM Kristof Beyls <<a href="mailto:Kristof.Beyls@arm.com" target="_blank" class="">Kristof.Beyls@arm.com</a>> wrote:<br class="">

</div>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<div class="">Hi,<br class="">

<br class="">

I’ve been investigating <a href="https://bugs.llvm.org/show_bug.cgi?id=43374" target="_blank" class="">https://bugs.llvm.org/show_bug.cgi?id=43374</a>, which is about clang/llvm producing code that triggers a floating point exception when x is NaN, when targeting

 ARM, in the below code example.<br class="">

<br class="">

<div class="">int bar(float x) {</div>

<div class="">  return x!=x ? 0 : 1;</div>

<div class="">}</div>

<br class="">

<div class="">The C99 standard states in section 7.12.14:</div>

<div class=""><br class="">

</div>

<div class="">"""</div>

<div class="">The relational and equality operators support the usual mathematical relationships between numeric values. For any ordered pair of numeric values exactly one of the relationships — less, greater, and equal — is true. Relational operators may raise

 the ‘‘invalid’’ floating-point exception when argument values are NaNs.</div>

<div class="">"""</div>

<div class=""><br class="">

</div>

<div class="">My interpretation of that paragraph is that it's OK for <, <=, > and >= to raise an exception when argument values are NaNs. It is not OK for == an != to raise an exception when argument values are NaNs.</div>

<br class="">

<div class="">Therefore,</div>

<br class="">

<div class="">int bar(float x) {</div>

<div class="">  return x!=x ? 0 : 1;</div>

<div class="">}</div>

<br class="">

should not produce an exception when x is NaN, and hence a <font face="Menlo" class="">vcmp</font> rather than <font face="Menlo" class="">vcmpe</font> instruction should be produced when generating ARM code for this.

<div class=""><br class="">

</div>

<div class=""><a href="http://llvm.org/viewvc/llvm-project?rev=294945&view=rev" target="_blank" class="">http://llvm.org/viewvc/llvm-project?rev=294945&view=rev</a> introduced support for generating <font face="Menlo" class="">vcmp</font> instead of <font face="Menlo" class="">vcmpe</font>

 for equality comparisons. How come <font face="Menlo" class="">vcmpe</font> is generated for (x!=x)?</div>

<div class=""><br class="">

</div>

<div class="">The answer is that InstCombine transforms the equality comparison into an "ordered comparison”. Before InstCombine:</div>

<div class=""><span style="font-family:Menlo" class="">define dso_local i32 @bar(float %x) local_unnamed_addr {</span></div>

<div class=""><font face="Menlo" class="">entry:</font></div>

<div class=""><font face="Menlo" class="">  %cmp = fcmp une float %x, %x</font></div>

<div class=""><font face="Menlo" class="">  %cond = select i1 %cmp, i32 0, i32 1</font></div>

<div class=""><font face="Menlo" class="">  ret i32 %cond</font></div>

<div class=""><font face="Menlo" class="">}</font></div>

<br class="">

After InstCombine:<br class="">

<div class=""><font face="Menlo" class="">define dso_local i32 @bar(float %x) local_unnamed_addr #0 {</font></div>

<div class=""><font face="Menlo" class="">entry:</font></div>

<div class=""><font face="Menlo" class="">  %cmp = fcmp ord float %x, 0.000000e+00</font></div>

<div class=""><font face="Menlo" class="">  %cond = zext i1 %cmp to i32</font></div>

<div class=""><font face="Menlo" class="">  ret i32 %cond</font></div>

<div class=""><font face="Menlo" class="">}</font></div>

<br class="">

Please note that on other backends like x86 or AArch64, this InstCombine doesn’t trigger floating point exception behaviour since those backends don’t seem to be producing any instructions for fcmp that raise floating point exceptions on NaNs.<br class="">

<br class="">

My question here is: how to fix this behaviour? Or: which part in the compilation flow is wrong?<br class="">

Reading through various standards and specifications, I’m getting confused to what the best fix would be:<br class="">

<br class="">

<ul class="">

<li class=""><a href="https://llvm.org/docs/LangRef.html#floating-point-environment" target="_blank" class="">https://llvm.org/docs/LangRef.html#floating-point-environment</a> states "<span style="font-variant-ligatures:normal;background-color:rgb(255,255,255)" class="">The

 default LLVM floating-point environment assumes that floating-point instructions do not have side effects. Results assume the round-to-nearest rounding mode. No floating-point exception state is maintained in this environment. Therefore, there is no attempt

 to create or preserve invalid operation (SNaN) or division-by-zero exceptions.</span><span style="font-variant-ligatures:normal;background-color:rgb(255,255,255)" class=""><font style="font-size:14px" face="Lucida Grande, Lucida Sans Unicode, Geneva, Verdana, sans-serif" class="">”</font><br class="">

This suggests that if we want to retain floating point exception behaviour in the compilation flow, we shouldn’t be using the “default LLVM floating-point environment”, but rather something else. Presumably the constrained intrinsics? However, when I look at

 the constrained intrinsics definition, it seems (</span><a href="http://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics" target="_blank" class="">http://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics</a>) there is no constrained

 intrinsic for the floating point comparison operation. Should there be one?</li><li class="">If the default floating-point environment assumes that floating-point instructions do not have side effects, why does the Arm backend lower floating point comparison to vcmpe rather than vcmp? The revision history suggests this has been this way

 since the initial creation of the ARM backend. Should this behaviour be changed and vcmp be produced rather than vcmpe? And only later, once the generation of constrained floating point intrinsics is implemented should backends start producing signalling floating

 point comparisons for floating point comparison constrained intrinsics (assuming they’ll exist by then)?</li><li class="">Or alternatively, there is a good reason to keep on producing vcmpe as is today, and instcombine just shouldn’t convert “fcmp une” into “fcmp ord”?</li><li class="">Or as yet another alternative, instcombine is just fine converting “fcmp une” into “fcmp ord”, and it’s the ARM backend that should produce vcmp rather than vcmpe also for “unordered” comparisons, next to equality comparisons?</li></ul>

<div class=""><br class="">

</div>

<div class="">Thanks,</div>

<div class=""><br class="">

</div>

<div class="">Kristof</div>

</div>

</blockquote>

</div>

</div>

</blockquote>

</div>

<br class="">

</div>

</body>

</html>