[PATCH] D49561: AMDGPU: Try to make isKnownNeverSNan more accurate

Fri Jul 20 11:39:12 PDT 2018

rampitec added a comment.

In https://reviews.llvm.org/D49561#1169653, @arsenm wrote:

> In https://reviews.llvm.org/D49561#1169015, @rampitec wrote:
>
> > In https://reviews.llvm.org/D49561#1168791, @arsenm wrote:
> >
> > > In https://reviews.llvm.org/D49561#1168643, @rampitec wrote:
> > >
> > > > I would tend to say isKnownNeverSNan here basically tells not that it cannot be sNaN, but rather that we do not care even if it is. At least we should not care if FP exceptions are off.
> > >
> > >
> > > Except that's not true since there can be sNaNs can be loaded, so even if exceptions aren't handled they need to be dealt with
> >
> >
> > Do you mean we shall quiet snans with an fmin/fmax even if exception handling is off? My understanding was different. In fact I thought we can completely ignore them with say fast math.
>
>
> Yes. My understanding is the exception handling aspect is different from the existence of sNaNs, and also orthogonal to enabling fast math. If exception handling is enabled, operations that can produce sNaNs will, but there can still be sNaNs loaded from memory, constants etc. I think if the function is marked no-nans, we can maybe assume this won't happen? The fast math flags aren't sufficient there because arbitrary operations like a load or function argument don't have an associated flag.
>
> With ieee_mode enabled (which is currently the default) v_min_f32 et. al.  a qNaN is returned if either input is an sNaN. If ieee_mode is off, it returns the non-NaN operand as-if it were a qNaN. I think for what we actually want, ieee_mode is harmful since it requires the library implementation of OpenCL's fmin to insert canonicalizes to quiet the inputs. Since LLVM doesn't properly handle sNaNs anywhere, I think enabling this is a bit pointless. However, whether it's on or not, I think we need to to get sNaN behavior correct at least for this one operation in order to be able to optimize out redundant canonicalizes.

More general, we have several questions:

1. Should we care about sNaNs with FP exceptions disabled?
2. Should we care about sNaNs if a node is known never NaN?
3. Should we care about sNaNs if no-nans is enabled?
4. Should we care about sNaNs if fast-math is enabled?
5. Should we turn ieee_mode off?

Let’s see:

1. Description of setHasFloatingPointExceptions():

  /// Tells the code generator that this target supports floating point
  /// exceptions and cares about preserving floating point exception behavior.

I read it this way: if we say no here, we do not care about preserving floating point exception behavior. I.e. we may assume there are no sNaNs even if they are present.

2. sNaN is a signaling NaN, so it is a NaN. If a node is known to be not a NaN, it cannot be its signaling form as well.
3. -enable-no-nans-fp-math and derivatives:

  /// NoNaNsFPMath - This flag is enabled when the
  /// -enable-no-nans-fp-math flag is specified on the command line. When
  /// this flag is off (the default), the code generator is not allowed to
  /// assume the FP arithmetic arguments and results are never NaNs.

So then if that flag is on we are allowed to assume not only FP arithmetic results are never NaN, but also arguments. Even if they are and even if they are loaded or constant sNaN/NaN.

4. UnsafeAlgebra includes NoNaNs and thus all of the above applies.
5. I do not believe we have to disable ieee_mode.
  - Even min/max handling was changed few times with and without ieee_mode bit in different generations of our GPU, so that is not a universal solution.
  - Other fp operations will not quiet sNaNs as required if you disable it.

I believe we were exploring the idea to disable ieee in the past and found we will have more problems without it. With ieee we only have min/max problem.

https://reviews.llvm.org/D49561