[PATCH] D90052: AMDGPU/GlobalISel: Add clamp combine for IEEE=false

Fri Oct 23 09:06:03 PDT 2020

arsenm added a comment.

Missing tests for f16 cases

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp:187
+        Register InnerInstDef = InnerInst->getOperand(0).getReg();
+        if (Info->getMode().DX10Clamp || isKnownNeverNaN(InnerInstDef, MRI)) {
+          MatchInfo = {AMDGPU::G_AMDGPU_CLAMP, ValDef, 0, 0};
----------------
Check this first around the whole section?

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp:207-216
+static void applyMinMaxToMed3OrClamp(MachineInstr &MI,
+                                     MinMaxToMed3OrClampMatchInfo &MatchInfo) {
+  MachineIRBuilder B(MI);
+  if (MatchInfo.Opc == AMDGPU::G_AMDGPU_CLAMP)
+    B.buildInstr(AMDGPU::G_AMDGPU_CLAMP, {MI.getOperand(0)}, {MatchInfo.Val0});
+  else
+    B.buildInstr(MatchInfo.Opc, {MI.getOperand(0)},
----------------
These look like two distinct apply actions to me

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp:224-228
+  const MachineInstr *Inst = MRI.getVRegDef(MI.getOperand(Idx).getReg());
+  if (Inst->getOpcode() == AMDGPU::G_FCONSTANT &&
+      Inst->getOperand(1).getFPImm()->isExactlyValue(Val))
+    return true;
+  return false;
----------------
getConstantFPVRegVal

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp:233
+                              FMed3ToClampMatchInfo &MatchInfo) {
+  if (MI.getOpcode() == AMDGPU::G_INTRINSIC &&
+      MI.getIntrinsicID() == Intrinsic::amdgcn_fmed3) {
----------------
This should be implied by the matcher opcode

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp:234
+  if (MI.getOpcode() == AMDGPU::G_INTRINSIC &&
+      MI.getIntrinsicID() == Intrinsic::amdgcn_fmed3) {
+    // In llvm IR, clamp is often represented as intrinsic call to
----------------
Early return on intrinsic check

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp:235-246
+    // In llvm IR, clamp is often represented as intrinsic call to
+    // @llvm.amdgcn.fmed3.f32(%Val, 0.0, 1.0). Check for other operand orders.
+    unsigned ValIdx = 2, ZeroIdx = 3, OneIdx = 4;
+    // Find index of the operand with 0.0.
+    if (!isOperandExactlyValue(MI, ZeroIdx, MRI, 0.0)) {
+      std::swap(ZeroIdx, ValIdx);
+      if (!isOperandExactlyValue(MI, ZeroIdx, MRI, 0.0)) {
----------------
I think the flow of the DAG version is easier to handle, swapping the values and then checking isClampZeroToOne at the end

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp:247
+    }
+    // Find index of the operand with 1.0, remainig index is Val.
+    if (!isOperandExactlyValue(MI, OneIdx, MRI, 1.0)) {
----------------
Typo remainig

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp:254
+
+    const MachineFunction *MF = MI.getMF();
+    const SIMachineFunctionInfo *Info = MF->getInfo<SIMachineFunctionInfo>();
----------------
Should get this from the MachineIRBuilder

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp:256-260
+    // It is safe to clamp for non-NaN input. For NaN input, fmed3(a, b, c) is
+    // equivalent to min(min(a, b), c). With IEEE=false, when one input is NaN
+    // min returns non-NaN input. Thus fmed3(NaN, 0.0, 1.0) evaluates to 0.0 for
+    // all operand permutations. This is safe to clamp only with DX10Clamp=true
+    // (when true clamps NaN to 0.0, otherwise lets NaN through).
----------------
I think something is confused here. IEEE mode only changes signaling nan behavior. I also think trying to handle IEEE=0 correctly with snans is a fools errand since *no* operations will behave correctly. The DAG combine for this does not consider IEEE mode

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp:275
+                              FMed3ToClampMatchInfo &MatchInfo) {
   MachineIRBuilder B(MI);
+
----------------
Should not construct a new MachineIRBuilder, there should be one existing already in the combiner

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90052/new/

https://reviews.llvm.org/D90052