[PATCH] D90052: AMDGPU/GlobalISel: Add clamp combine for IEEE=false

Tue Nov 3 05:52:37 PST 2020

Petar.Avramovic added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp:235-246
+    // In llvm IR, clamp is often represented as intrinsic call to
+    // @llvm.amdgcn.fmed3.f32(%Val, 0.0, 1.0). Check for other operand orders.
+    unsigned ValIdx = 2, ZeroIdx = 3, OneIdx = 4;
+    // Find index of the operand with 0.0.
+    if (!isOperandExactlyValue(MI, ZeroIdx, MRI, 0.0)) {
+      std::swap(ZeroIdx, ValIdx);
+      if (!isOperandExactlyValue(MI, ZeroIdx, MRI, 0.0)) {
----------------
arsenm wrote:
> Petar.Avramovic wrote:
> > arsenm wrote:
> > > I think the flow of the DAG version is easier to handle, swapping the values and then checking isClampZeroToOne at the end
> > That swaps operands and might produce different result with IEEE=1
> Are you saying the DAG version is broken?
DAG version should also check for SNaN (This is for IEEE=true) alongside DX10Clamp.
result of fmed3 depends on operand order when one is SNaN 
swapping the operands and deciding to give up since input might be SNaN might create different result:
med3_f32 (0, 1, SNaN) = min(min(0,1), SNaN) = QNaN
med3_f32 (0, 1, SNaN) -(swap)-> med3_f32 (SNaN, 0, 1) = min(min(SNaN, 0), 1) = 1

swapping is fine for QNaN.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp:256-260
+    // It is safe to clamp for non-NaN input. For NaN input, fmed3(a, b, c) is
+    // equivalent to min(min(a, b), c). With IEEE=false, when one input is NaN
+    // min returns non-NaN input. Thus fmed3(NaN, 0.0, 1.0) evaluates to 0.0 for
+    // all operand permutations. This is safe to clamp only with DX10Clamp=true
+    // (when true clamps NaN to 0.0, otherwise lets NaN through).
----------------
arsenm wrote:
> Petar.Avramovic wrote:
> > arsenm wrote:
> > > I think something is confused here. IEEE mode only changes signaling nan behavior. I also think trying to handle IEEE=0 correctly with snans is a fools errand since *no* operations will behave correctly. The DAG combine for this does not consider IEEE mode
> > IEEE=0 does not treat SNaN differently and needs no SNaN checks. With IEEE=0 DX10Clamp=1 (shaders want all possible x2, x4, div2, and clamp modifiers) there is no need for any nan check. Other operation don't really mention that backend needs to check for special values. They do have different behavior when input is SNaN but correctness in not mentioned. What do we have to do force correct behavior for IEEE=0?
> > The considering of SNaN only comes from compute kernel functions being in IEEE=1 mode and trying to lower fminnum to fminnum_ieee (If higher language used fminnum_ieee with IEEE=1 there would be no need for SNaN checks at all). 
> > 
> With IEEE=0, no operations quiet signalling nans. All operations behave incorrectly and pass through the snan instead of quieting it. The lack of quieting the snan is a correctness issue. It would be a massive amount of work to actually implement the correct behavior and manually quiet snan inputs to implement the correct generic operation semantics, which we will probably never do.
>All operations behave incorrectly and pass through the snan instead of quieting it. The lack of quieting the snan is a correctness issue.
Is this for llvm-ir? Would it then be acceptable to only perform combine if nnan flag (check using isKnownNeverNaN) is present on the instruction with IEEE=false?   
Also when IEEE=false we should never use isKnownNeverSNaN.
For IEEE=true we could then check for NaN input or DX10Clamp.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90052/new/

https://reviews.llvm.org/D90052