[PATCH] D48556: Fix asserts in AMDGCN fmed3 folding by handling more cases of NaN

Mon Jun 25 12:51:19 PDT 2018

arsenm added inline comments.

================
Comment at: lib/Transforms/InstCombine/InstCombineCalls.cpp:3429-3432
+    // Checking for NaN before canonicalization provides better fidelity for
+    // mapping fclamp operations to fmed3. This will still not properly model
+    // NaN in fclamp if the intrinsic is canonicalized prior to the NaN being
+    // determined.
----------------
This comment is sort of confusing me. Is it saying there is still a problem? 

Probably shouldn't mention fclamp, as no operation called fclamp is involved here.

================
Comment at: lib/Transforms/InstCombine/InstCombineCalls.cpp:3447-3452
+    if (match(Src2, m_NaN()) || isa<UndefValue>(Src2)) {
+      CallInst *NewCall = Builder.CreateMaxNum(Src0, Src1);
+      NewCall->copyFastMathFlags(II);
+      NewCall->takeName(II);
+      return replaceInstUsesWith(*II, NewCall);
+    }
----------------
I don't really like repeating all of these so many times, but it's probably clearer than the alternatives. Should this used FastMathFlagGuard to avoid repeating the copyFastMathFlags at least?

================
Comment at: test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll:1337-1357
+; CHECK-LABEL: @fmed3_nan_0_1_f32(
+; CHECK: ret float 0.0
+define float @fmed3_nan_0_1_f32() {
+  %med3 = call float @llvm.amdgcn.fmed3.f32(float 0x7FF8001000000000, float 0.0, float 1.0)
+  ret float %med3
+}
+
----------------
Since the undef value check is repeated so many times, better add versions of these with undef instead of nan to make sure those work

Repository:
  rL LLVM

https://reviews.llvm.org/D48556