[PATCH] D88572: AMDGPU/SelectionDAG Check for NaN, DX10Clamp and IEEE in fmed3 combine
Jay Foad via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Oct 13 05:15:40 PDT 2020
foad added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:9768
+ if (Op0->getFlags().hasNoNaNs() || DAG.getTarget().Options.NoNaNsFPMath)
+ return DAG.getNode(AMDGPUISD::CLAMP, SL, VT, Var, Op0->getFlags());
+
----------------
Is there a good reason to keep flags from the inner node, here, but not on line 9775?
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:9773-9774
+ if (DAG.isKnownNeverNaN(Var) ||
+ (Info->getMode().DX10Clamp &&
+ (!Info->getMode().IEEE || DAG.isKnownNeverSNaN(Var))))
+ return DAG.getNode(AMDGPUISD::CLAMP, SL, VT, Var);
----------------
You're assuming that Info->getMode().DX10Clamp affects the behaviour of AMDGPUISD::CLAMP which I think is OK, though it would be nice if that was clearly documented (I can't understand the comment about dx10_enable in AMDGPUISelLowering.h).
You're also assuming that Info->getMode().IEEE affects the behaviour of the Op0 node which could be ISD::FMAXNUM or ISD::FMAXNUM_IEEE or FMAX_LEGACY. This seems wrong to me.
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:9880
+ // NaNs. med3_f32(a, b, c) is equivalent to min_f32(min_f32(a,b), c) when one
+ // of the a, b or c in NaN. Result is sensitive to the order of inputs in IEEE
+ // mode when one of them is SNaN.
----------------
Typo "is NaN".
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:9897
+ }
+ // Find index of the operand with 1.0, remainig index is X.
+ if (!isOperandExactlyValue(N, OneIdx, 1.0)) {
----------------
Typo "remaining".
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:9907
+ if (N->getFlags().hasNoNaNs() || DAG.getTarget().Options.NoNaNsFPMath)
+ return DAG.getNode(AMDGPUISD::CLAMP, SL, VT, X, N->getFlags());
----------------
Why do you keep flags here but not on line 9925?
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:9910
+ // It is safe to clamp when X is not NaN.
+ // Based on v_min_f32 behaviour, IEEE and DX10Clamp settings:
+ // X=SNaN and IEEE=true: min(SNaN, Val) -> QNaN, min(QNaN, Val) -> Val
----------------
You're assuming that the behaviour of AMDGPUISD::FMED3 depends on Info->getMode().IEEE, which seems reasonable.
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:9912
+ // X=SNaN and IEEE=true: min(SNaN, Val) -> QNaN, min(QNaN, Val) -> Val
+ // med3 returns either QNaN, 1.0 or 0.0 deppending on the inputs.
+ // DX10Clamp=true: clamps SNaN to 0.0, ok when med3's operand(2) is 0.0.
----------------
Typo "depending".
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:9922
+ const SIMachineFunctionInfo *Info = MF.getInfo<SIMachineFunctionInfo>();
+ if (DAG.isKnownNeverNaN(X) || isa<ConstantFPSDNode>(X) ||
+ (Info->getMode().DX10Clamp &&
----------------
A constant could be a NaN, but isKnownNeverNaN should already have handled the constant case. So I don't think you need the check for isa<ConstantFPSDNode>(X) at all.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D88572/new/
https://reviews.llvm.org/D88572
More information about the llvm-commits
mailing list