[llvm-dev] how to simplify FP ops with an undef operand?

Thu Mar 1 21:32:51 PST 2018

On Mar 1, 2018, at 10:07 AM, Kaylor, Andrew <andrew.kaylor at intel.com> wrote:
> So you don’t think sNaNs can just be treated as if they were qNaNs? I understand why we would want to ignore the signaling part of things, but the rules for operating on NaNs are pretty clear and reasonable to implement. The signaling aspect can, I think, be safely ignored when we are in the mode of assuming the default FP environment.
>  
> As for the distinction between IEEE and LLVM IR, I would think we would want to define LLVM IR in such a way that it is possible to create and IEEE-compliant compiler. I know we’re not there yet, but we’re working toward it.

There appears to be confusion about the role of LLVM IR and its relation to undef and undefined behavior, at least it isn’t the first time :-)

Let me try to clarify.  Many LLVM IR instructions are only defined on some inputs.  For inputs outside their domain, they have undefined behavior or produce undefined results.  This isn’t perfectly codified, but people are working on it, but there are some things we *know* based on how the operations are modeled and what the compiler does with them.

Hopefully uncontroversial points:

 - Floating point operations are represented in LLVM IR in two ways: the fdiv/fmul/fadd etc instructions, and the llvm.experimental.constrained.* intrinsic forms.

 - The instruction forms are modeled as having no side effects.  fdiv/frem trap on divide by zero, but are otherwise defined on the same set of inputs as fadd/fmul/etc.

- Because they have no side effects, these instructions can be reordered freely (though for fdiv/frem, see footnote [1] below).  For example, it is legal to transform this:

   foo(x,y)
   tmp = a+b

into:

   tmp = a+b
   foo(x,y)

This can occur for many reasons: for example, because the compiler decides it is profitable (e.g. hoisting a loop invariant computation out of a loop), as a side effect of instruction scheduling, selection dag not having chain nodes on the ISD nodes, etc.

- Because the instruction forms have no side effects and can be reordered, they are not ok to use in the face of non-standard rounding mode or trapping flags.  This is the point of the experimental intrinsic forms, and the reason they exist.

- The intrinsic forms are defined to allow explicit rounding mode control and other features, but also are defined as having side effects.  This allows them to be used in the face of rounding mode changes, but also makes speculation a lot more careful.  These limitations to speculation are why we don’t just apply the intrinsic forms to the instructions.

- C99/C++ say nothing about SNaN’s, and there is some push to remove SNaN’s from the IEEE 754 standard.  See, e.g. this page, which was one of the first hits I found online, I’m sure there are others: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1011.htm <http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1011.htm>.  I’m not familiar with the state of the art in Java or other languages.

- The fact that C99 and C++ are undefined on SNaN’s by default, and default to ignoring rounding modes, mean that it is fine for clang to produce fadd/fmul/fdiv instructions in the normal mode.  It only needs to generate the intrinsic forms when the FENV_ACCESS pragma is set.

Potentially controversial points:

 - Because LLVM reorders and speculates the instruction forms, and because IEEE defines the corresponding IEEE operations as trapping on SNaNs, it is clear that SNaNs are outside of the domain of these LLVM operations.  Either speculation is ok or trapping on SNaN is ok, pick one…  (and we already did :) 

- Because the LLVM instructions are not defined on SNaNs, SNaNs are outside of their domain, and thus the LLVM instructions are undefined on these inputs.  As such, it would be perfectly reasonable to “constant fold” an "fadd SNaN, 42” instruction into unreachable and delete all the code after it, or turn it into a call to formatHardDrive().  [2]

- Because an ‘undef’ operand can be an arbitrary bit pattern representable by the type, and because the f32/f64 etc *types* can represent SNaNs, it is within the right of the compiler to constant fold “fadd undef, 42” into unreachable.  QED.

Summary and Recommendation:

I don’t see any reason around this, and I thought this was always the documented behavior in LangRef.  It seem that that was never documented and it has led to confusion on this thread.  I’d love to be surprised and find out that I’ve misinterpreted things (I’m no fan of UB!!!) but I don’t see a way around this.  This is just logical behavior that flows from how the compiler works and how it has always worked.

All that said, in my opinion, while it is within the “right" of the compiler to constant fold these things to unreachable, I see no motivation to actually do so.   LLVM has gone out of its way to define some simple forms of UB like trivial TBAA violations, and I see no downside to being nicer here. The code generator currently turns a floating point undef into a reference to some random FP register, which (at worse) causes an SNaN trap, but could just be a silent failure.  

As such, my recommendation is to simply document these as having UB when presented with SNaN inputs, but make the constant folder/instcombine/... fold “fadd undef, X” into “undef” instead of “unreachable”.   

In theory we could go further and define a new class of UB concepts in LLVM IR along the lines of “produces a undetermined value or traps, but doesn’t cause arbitrary UB” but that is a huge ball of wax with far reaching implications.

-Chris

[1] IIRC, we are more conservative about speculating divide/rem instructions because of divide by zero.  If that is true, it is possible we could handle these better than described above.

[2] Of course, executing an ‘unreachable’ instruction *can* format your hard drive, if the unreachable is at the bottom of the current function, and if the fall through function formats your hard drive...

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180301/aa313609/attachment.html>