[PATCH] D53157: Teach the IRBuilder about constrained fadd and friends

Mon Nov 19 10:47:07 PST 2018

andrew.w.kaylor added a comment.

In https://reviews.llvm.org/D53157#1302724, @uweigand wrote:

> A couple of comments on the previous discussion:
>
> 1. Instead of defining a new command line option, I'd prefer to use the existing options -frounding-math and -ftrapping-math to set the default behavior of math operations w.r.t. rounding modes and exception status.  (For compatibility with GCC if nothing else.)

I agree that it's preferable to re-use these existing options if possible. I have some concerns that -ftrapping-math has a partial implementation in place that doesn't seem to be well aligned with the way fast-math flags are handled, so it might require some work to have that working as expected without breaking existing users. In general though these seem like they should do what we need.

Regarding GCC compatibility, I notice that GCC defaults to trapping math being enabled and I don't think that's what we want with clang. It also seems to imply something more than I think we need for constrained handling. For example, the GCC documentation says that -fno-trapping-math "can result in incorrect output for programs that depend on an exact implementation of IEEE or ISO rules/specifications for math functions" so it sounds like maybe it also implies (for GCC)  something like LLVM's "afn" fast math flag.

So if we are going to use these options, I think we need to have a discussion about whether or not it's OK to diverge from GCC's interpretation of them.

In https://reviews.llvm.org/D53157#1302724, @uweigand wrote:

> 2. I also read the C standard to imply that it is a requirement of **user code** to reset the status flag to default before switching back to FENV_ACCESS OFF.  The fundamental characterization of the pragma says "The FENV_ACCESS pragma provides a means **to inform the implementation** when a program might access the floating-point environment to test floating-point status flags or run under non-default floating-point control modes."  There is no mention anywhere that using the pragma, on its own, will ever **change** those control modes.   The last sentence about "... the floating-point control modes have their default setting", while indeed a bit ambiguous, is still consistent with an interpretation that it is the responsibility of user code to ensure that state, there is no explicit statement that the implementation will do so.

I definitely agree with this interpretation of the standard. My understanding is that behavior is undefined if the user has not left the FP environment in the default state when transitioning to an FENV_ACCESS OFF region.

In https://reviews.llvm.org/D53157#1302724, @uweigand wrote:

> 3. I agree that we need to be careful about intermixing "normal" floating-point operations with strict ones.  However, I'm still not convinced that the pragma itself must be the scheduling barrier.  It seems to me that the compiler already knows where FP control flags are ever modified directly (this can only happen with intrinsics or the like), so the main issue is whether function calls need to be considered.  This is where the pragma comes in: in my mind, the primary difference between FENV_ACCESS ON and FENV_ACCESS OFF regions is that where the pragma is ON, function calls need to be considered (unless otherwise known for sure) to access FP control flags, while where the pragma is OFF, function calls can be considered to never touch FP control flags.  So the real scheduling barrier would be any **function call within a FENV_ACCESS ON region**.  Those would have to be marked by the front-end in the IR, presumably using a function attribute.  The common LLVM optimizers would then need to respect that scheduling barrier (here is where we likely still have an open issue, there doesn't appear to be any way to express that at the IR level for regular floating-point operations ...), and likewise the back-ends (but that looks straightforward: a back-end typically will model FP status as residing in a register or in a pseudo-memory slot, and those can simply be considered used/clobbered by function calls marked as within FENV_ACCESS ON regions).

I'm a bit confused by this. The constrained intrinsics will cause all calls to act as barriers to motion of the FP operations represented by the intrinsics (at least before instruction selection). So I'm not clear what you are saying is needed here.

https://reviews.llvm.org/D53157