[PATCH] D91716: AMDGPU/GlobalISel: Calculate isKnownNeverNaN for fminnum and fmaxnum

Wed Nov 18 11:45:10 PST 2020

arsenm added inline comments.

================
Comment at: llvm/lib/CodeGen/GlobalISel/Utils.cpp:498-499
+    case TargetOpcode::G_FMAXNUM: {
+      const TargetLowering *TLI = MF->getSubtarget().getTargetLowering();
+      return TLI->isKnownNeverNaNForMI(Val, MRI, SNaN);
+    }
----------------
Petar.Avramovic wrote:
> arsenm wrote:
> > Petar.Avramovic wrote:
> > > Can this then be `return true`?
> > No, it depends on the inputs. You can refer to the existing SelectionDAG::isKnownNeverNaN:
> > 
> > ```
> >     // Only one needs to be known not-nan, since it will be returned if the
> >     // other ends up being one.
> >     return isKnownNeverNaN(Op.getOperand(0), SNaN, Depth + 1) ||
> >            isKnownNeverNaN(Op.getOperand(1), SNaN, Depth + 1);
> > 
> > ```
> I don't think 
> ```
>     return isKnownNeverNaN(DefMI->getOperand(1).getReg(), MRI, SNaN) ||
>            isKnownNeverNaN(DefMI->getOperand(2).getReg(), MRI, SNaN);
> ```
> is enough for globalisel since it does not look at the whole picture. The formula above works when target naturally has FMINNUM instruction and knowing the inputs gives conclusion about the output (one input is not nan result is not nan). This would be generic formula (FIXME, use this formula in isKnownNeverNaNForMI instead of return false).
> But subtarget with IEEE=true does not have FMINNUM and has to lower it to FMINNUM_IEEE. This gives additional critical piece of information FMINNUM is never SNaN with IEEE=true.
> Are test changes correct?
> 
The single instruction is the whole picture and does not depend on the target. These have defined, universal NAN handling semantics regardless of however they are lowered. This patch should only aim to faithfully implement these semantics. Ensuring we don't have redundant quieting canonicalizes is an optimization beyond this patch.

As an optimization, the expansion to the IEEE version tries to avoid inserting a quiet, which mostly works well enough. We are missing some optimization hints. I have long wished that we had separate no-qnan and no-snan fast math flags (as well as renaming the IR intrinsics to match the real fmin/fmax semantics, and introduce a new pair for the IEEE-758 2008 semantics)

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91716/new/

https://reviews.llvm.org/D91716