[PATCH] D90052: AMDGPU/GlobalISel: Add clamp combine for IEEE=false

Matt Arsenault via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Nov 3 08:00:43 PST 2020


arsenm added inline comments.


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp:256-260
+    // It is safe to clamp for non-NaN input. For NaN input, fmed3(a, b, c) is
+    // equivalent to min(min(a, b), c). With IEEE=false, when one input is NaN
+    // min returns non-NaN input. Thus fmed3(NaN, 0.0, 1.0) evaluates to 0.0 for
+    // all operand permutations. This is safe to clamp only with DX10Clamp=true
+    // (when true clamps NaN to 0.0, otherwise lets NaN through).
----------------
Petar.Avramovic wrote:
> arsenm wrote:
> > Petar.Avramovic wrote:
> > > arsenm wrote:
> > > > I think something is confused here. IEEE mode only changes signaling nan behavior. I also think trying to handle IEEE=0 correctly with snans is a fools errand since *no* operations will behave correctly. The DAG combine for this does not consider IEEE mode
> > > IEEE=0 does not treat SNaN differently and needs no SNaN checks. With IEEE=0 DX10Clamp=1 (shaders want all possible x2, x4, div2, and clamp modifiers) there is no need for any nan check. Other operation don't really mention that backend needs to check for special values. They do have different behavior when input is SNaN but correctness in not mentioned. What do we have to do force correct behavior for IEEE=0?
> > > The considering of SNaN only comes from compute kernel functions being in IEEE=1 mode and trying to lower fminnum to fminnum_ieee (If higher language used fminnum_ieee with IEEE=1 there would be no need for SNaN checks at all). 
> > > 
> > With IEEE=0, no operations quiet signalling nans. All operations behave incorrectly and pass through the snan instead of quieting it. The lack of quieting the snan is a correctness issue. It would be a massive amount of work to actually implement the correct behavior and manually quiet snan inputs to implement the correct generic operation semantics, which we will probably never do.
> >All operations behave incorrectly and pass through the snan instead of quieting it. The lack of quieting the snan is a correctness issue.
> Is this for llvm-ir? Would it then be acceptable to only perform combine if nnan flag (check using isKnownNeverNaN) is present on the instruction with IEEE=false?   
> Also when IEEE=false we should never use isKnownNeverSNaN.
> For IEEE=true we could then check for NaN input or DX10Clamp.
Yes. Officially the IR description pretends snans do not exist, but really they do and need to be quieted by FP canonicalizing FP operations as per IEEE754. I'm not sure it even makes sense to try to get correct snan behavior with IEEE=0 since it's never going to be implemented. The comment here absolutely needs to make clear the distinction between snan and qnan handling

Getting IEEE=true correct is far more interesting/important.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90052/new/

https://reviews.llvm.org/D90052



More information about the llvm-commits mailing list