[PATCH] D48556: Fix asserts in AMDGCN fmed3 folding by handling more cases of NaN

Tue Jun 26 02:01:41 PDT 2018

arsenm added inline comments.

================
Comment at: lib/Transforms/InstCombine/InstCombineCalls.cpp:3429-3432
+    // Checking for NaN before canonicalization provides better fidelity for
+    // mapping fclamp operations to fmed3. This will still not properly model
+    // NaN in fclamp if the intrinsic is canonicalized prior to the NaN being
+    // determined.
----------------
alan-baker wrote:
> arsenm wrote:
> > This comment is sort of confusing me. Is it saying there is still a problem? 
> > 
> > Probably shouldn't mention fclamp, as no operation called fclamp is involved here.
> If fmed3(x,y,z) was used to model fclamp(x,y,z) (i.e. fmin(fmax(x,y), z)), and x or y are determined to be a constant prior to determining one of the operands is a NaN, then the canonicalization will swap operands around. If later an operand is determined to be a NaN then the evaluation of the fmed3 may no longer match the original fclamp operation. So there is a potential problem if not all the constants are determined prior to the invocation of inst combine, but are for subsequent invocations. This problem is specifically for modelling fclamp, which is why I mentioned it here. I can remove the comment if you prefer, but LLPC maps fclamp directly to fmed3 (and I'm not sure I see a performant alternative without introducing a fclamp intrinsic).
There should be a comment, it just needs to be reworded. As is I think it relies on knowing the history of you adding this code here before the existing code

================
Comment at: lib/Transforms/InstCombine/InstCombineCalls.cpp:3447-3452
+    if (match(Src2, m_NaN()) || isa<UndefValue>(Src2)) {
+      CallInst *NewCall = Builder.CreateMaxNum(Src0, Src1);
+      NewCall->copyFastMathFlags(II);
+      NewCall->takeName(II);
+      return replaceInstUsesWith(*II, NewCall);
+    }
----------------
alan-baker wrote:
> arsenm wrote:
> > I don't really like repeating all of these so many times, but it's probably clearer than the alternatives. Should this used FastMathFlagGuard to avoid repeating the copyFastMathFlags at least?
> Calls don't automatically inherit fast math flags from the builder so creating the guard doesn't work. If it's preferable, I could refactor to create the call instruction first and in a subsequent if take the name and copy the fast math flags.
I guess that's sort of a bug in the IRBuilder but I can sort of see why it is that way. I guess factoring that way may be better

Repository:
  rL LLVM

https://reviews.llvm.org/D48556