[PATCH] D90052: AMDGPU/GlobalISel: Add clamp combine for IEEE=false
Matt Arsenault via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Oct 23 09:06:03 PDT 2020
arsenm added a comment.
Missing tests for f16 cases
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp:187
+ Register InnerInstDef = InnerInst->getOperand(0).getReg();
+ if (Info->getMode().DX10Clamp || isKnownNeverNaN(InnerInstDef, MRI)) {
+ MatchInfo = {AMDGPU::G_AMDGPU_CLAMP, ValDef, 0, 0};
----------------
Check this first around the whole section?
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp:207-216
+static void applyMinMaxToMed3OrClamp(MachineInstr &MI,
+ MinMaxToMed3OrClampMatchInfo &MatchInfo) {
+ MachineIRBuilder B(MI);
+ if (MatchInfo.Opc == AMDGPU::G_AMDGPU_CLAMP)
+ B.buildInstr(AMDGPU::G_AMDGPU_CLAMP, {MI.getOperand(0)}, {MatchInfo.Val0});
+ else
+ B.buildInstr(MatchInfo.Opc, {MI.getOperand(0)},
----------------
These look like two distinct apply actions to me
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp:224-228
+ const MachineInstr *Inst = MRI.getVRegDef(MI.getOperand(Idx).getReg());
+ if (Inst->getOpcode() == AMDGPU::G_FCONSTANT &&
+ Inst->getOperand(1).getFPImm()->isExactlyValue(Val))
+ return true;
+ return false;
----------------
getConstantFPVRegVal
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp:233
+ FMed3ToClampMatchInfo &MatchInfo) {
+ if (MI.getOpcode() == AMDGPU::G_INTRINSIC &&
+ MI.getIntrinsicID() == Intrinsic::amdgcn_fmed3) {
----------------
This should be implied by the matcher opcode
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp:234
+ if (MI.getOpcode() == AMDGPU::G_INTRINSIC &&
+ MI.getIntrinsicID() == Intrinsic::amdgcn_fmed3) {
+ // In llvm IR, clamp is often represented as intrinsic call to
----------------
Early return on intrinsic check
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp:235-246
+ // In llvm IR, clamp is often represented as intrinsic call to
+ // @llvm.amdgcn.fmed3.f32(%Val, 0.0, 1.0). Check for other operand orders.
+ unsigned ValIdx = 2, ZeroIdx = 3, OneIdx = 4;
+ // Find index of the operand with 0.0.
+ if (!isOperandExactlyValue(MI, ZeroIdx, MRI, 0.0)) {
+ std::swap(ZeroIdx, ValIdx);
+ if (!isOperandExactlyValue(MI, ZeroIdx, MRI, 0.0)) {
----------------
I think the flow of the DAG version is easier to handle, swapping the values and then checking isClampZeroToOne at the end
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp:247
+ }
+ // Find index of the operand with 1.0, remainig index is Val.
+ if (!isOperandExactlyValue(MI, OneIdx, MRI, 1.0)) {
----------------
Typo remainig
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp:254
+
+ const MachineFunction *MF = MI.getMF();
+ const SIMachineFunctionInfo *Info = MF->getInfo<SIMachineFunctionInfo>();
----------------
Should get this from the MachineIRBuilder
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp:256-260
+ // It is safe to clamp for non-NaN input. For NaN input, fmed3(a, b, c) is
+ // equivalent to min(min(a, b), c). With IEEE=false, when one input is NaN
+ // min returns non-NaN input. Thus fmed3(NaN, 0.0, 1.0) evaluates to 0.0 for
+ // all operand permutations. This is safe to clamp only with DX10Clamp=true
+ // (when true clamps NaN to 0.0, otherwise lets NaN through).
----------------
I think something is confused here. IEEE mode only changes signaling nan behavior. I also think trying to handle IEEE=0 correctly with snans is a fools errand since *no* operations will behave correctly. The DAG combine for this does not consider IEEE mode
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp:275
+ FMed3ToClampMatchInfo &MatchInfo) {
MachineIRBuilder B(MI);
+
----------------
Should not construct a new MachineIRBuilder, there should be one existing already in the combiner
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D90052/new/
https://reviews.llvm.org/D90052
More information about the llvm-commits
mailing list