[PATCH] D90052: AMDGPU/GlobalISel: Add clamp combine for IEEE=false

Fri Oct 23 11:30:02 PDT 2020

Petar.Avramovic added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp:207-216
+static void applyMinMaxToMed3OrClamp(MachineInstr &MI,
+                                     MinMaxToMed3OrClampMatchInfo &MatchInfo) {
+  MachineIRBuilder B(MI);
+  if (MatchInfo.Opc == AMDGPU::G_AMDGPU_CLAMP)
+    B.buildInstr(AMDGPU::G_AMDGPU_CLAMP, {MI.getOperand(0)}, {MatchInfo.Val0});
+  else
+    B.buildInstr(MatchInfo.Opc, {MI.getOperand(0)},
----------------
arsenm wrote:
> These look like two distinct apply actions to me
They share same match action. If we want to split them to matchMinMaxToClamp and matchMinMaxToMed3 we might end up matching min/max pattern twice.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp:235-246
+    // In llvm IR, clamp is often represented as intrinsic call to
+    // @llvm.amdgcn.fmed3.f32(%Val, 0.0, 1.0). Check for other operand orders.
+    unsigned ValIdx = 2, ZeroIdx = 3, OneIdx = 4;
+    // Find index of the operand with 0.0.
+    if (!isOperandExactlyValue(MI, ZeroIdx, MRI, 0.0)) {
+      std::swap(ZeroIdx, ValIdx);
+      if (!isOperandExactlyValue(MI, ZeroIdx, MRI, 0.0)) {
----------------
arsenm wrote:
> I think the flow of the DAG version is easier to handle, swapping the values and then checking isClampZeroToOne at the end
That swaps operands and might produce different result with IEEE=1

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp:256-260
+    // It is safe to clamp for non-NaN input. For NaN input, fmed3(a, b, c) is
+    // equivalent to min(min(a, b), c). With IEEE=false, when one input is NaN
+    // min returns non-NaN input. Thus fmed3(NaN, 0.0, 1.0) evaluates to 0.0 for
+    // all operand permutations. This is safe to clamp only with DX10Clamp=true
+    // (when true clamps NaN to 0.0, otherwise lets NaN through).
----------------
arsenm wrote:
> I think something is confused here. IEEE mode only changes signaling nan behavior. I also think trying to handle IEEE=0 correctly with snans is a fools errand since *no* operations will behave correctly. The DAG combine for this does not consider IEEE mode
IEEE=0 does not treat SNaN differently and needs no SNaN checks. With IEEE=0 DX10Clamp=1 (shaders want all possible x2, x4, div2, and clamp modifiers) there is no need for any nan check. Other operation don't really mention that backend needs to check for special values. They do have different behavior when input is SNaN but correctness in not mentioned. What do we have to do force correct behavior for IEEE=0?
The considering of SNaN only comes from compute kernel functions being in IEEE=1 mode and trying to lower fminnum to fminnum_ieee (If higher language used fminnum_ieee with IEEE=1 there would be no need for SNaN checks at all). 

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90052/new/

https://reviews.llvm.org/D90052