[cfe-dev] Combining fast math flags with constrained intrinsics

Fri Jan 17 16:30:07 PST 2020

Hi all,

A question came up in a code review (https://reviews.llvm.org/D72820) about whether or not to allow fast-math flags to be applied to constrained floating point intrinsics (http://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics). This has come up several times before, but I don't think we've ever made a decision about it.

By default, the optimizer assumes that floating point operations have no side effects and use the default rounding mode (round to nearest, ties to even). The constrained intrinsics are meant to prevent the optimizer from making these assumptions when the user wants to access the floating point environment -- to change the rounding mode, to check floating point status bits, or to unmask floating point exceptions. The intrinsics have an argument that either specify a rounding mode that may be assumed or specify that the rounding mode is unknown (this argument is omitted if it doesn't apply to the operation) and an argument to specify whether the user wants precise exception semantics to be preserved, wants to prevent syntactically spurious exceptions from being raised, or doesn't care about floating point exceptions.

Because the constrained mode can be localized to a sub-region within a function, we also need to support the case where a constrained intrinsic is used but the default behavior (default rounding mode, exceptions ignored) is used. For this reason, I think our IR definition must allow fast math flags to be applied to constrained intrinsics. That makes this primarily a question about what combinations should be permitted by front ends and how constructs like pragmas should affect the various states. For example, I might have source code like this:

-=-=-=-=-=-=-=-=

double doSomethingVague(double X, double Y, double Z) {
  // Some operation that doesn't need to be precise.
  if (X/Y > SomeThreshold)
    return doSomethingPrecise(X, Y);
  else
    return Z;
}

#pragma STDC FENV_ACCESS ON
double doSomethingPrecise(double X, double Y) {
  int SaveRM = fegetround();
  fesetround(FE_DOWNWARD);
  feclearexcept(FE_ALL_EXCEPT);
  double Temp = X * Y + Z;
  if (fetestexcept(FE_ALL_EXCEPT))
    std::cerr << "Something happened.\n";
  fesetround(SaveRM);
  return Temp;
}

-=-=-=-=-=-=-=-=

Now suppose I compile that with "-O2 -ffp-contract=fast". We will need to generate constrained intrinsics for the X*Y+Z expression in doSomethingPrecise. The question is, should clang (a) generate a constrained version of the llvm.fmuladd instrinsic, (b) generate separate constrained.fmul and constrained.fadd instrinsics with the contract fast math flag set, or (c) generate separate constrained.fmul and constrained.fadd instrinsics with no fast math flag? I would argue for (b), because I think it's reasonable for a user who cares about precision and FP exceptions to still want FMA, which theoretically is more precise. I think not (a) because clang doesn't usually generate the fmuladd intrinsic with -ffp-contract=fast. On the other hand, if the code also contained an FP_CONTRACT pragma around doSomethingPrecise() I think clang should do (a).

Supporting the FP_CONTRACT case is the point of the D72820 patch.

But let's make this more interesting. Suppose I compile with "-O2 -fp-model=strict -ffp-contract=fast -fno-honor-nans -fno-honor-infinities -fno-signed-zeros" instead and the code does not contain the FENV_ACCESS pragma (I'll get back to that). Should the nnan, ninf, and nsz fast math flags be applied to the constrained intrinsics? I lean toward "yes". The way I see it, these command line options are a way for the user to tell the compiler that their data will not contain NaNs or infinities and that their algorithms do not depend on the sign of zero. These flags enable us to make some optimizations that will not affect rounding or exception semantics as long as the data is as the user claimed. This will be particularly useful for the strict exception semantics because there are cases where we have to execute additional instructions just to preserve the exception semantics in the case where one of the operands is a NaN. If the user knows that will never happen, we can produce better code.

Now back to the reason I wanted to consider that without the pragma. Consider my code above with the pragma again, now imagine I compile it with "-O2 -fp-model=fast". In this case, the pragma almost certainly intends to remove some fast math flags. For instance, I don't think it makes sense to say you care about rounding mode but want to allow reassociation (because reassociation has a much bigger potential to change results than rounding mode). The flags discussed above could make sense, but we have no clear way to know what the user intended, so I think in this case we must clear all the fast math flags. This kind of conflicts with what I said about the contract flag above. I obviously have mixed feelings about that. In the presence of a pragma, we should probably block the contract flag too, but I don't like having to do that for that specific case.

There's my opinion. I'd like to hear what others think.

Thanks,
Andy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200118/bab29b93/attachment.html>