[PATCH] D137811: InstCombine: Port amdgcn.class intrinsic combines to is.fpclass

Wed Nov 30 09:01:09 PST 2022

arsenm added inline comments.

================
Comment at: llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp:847
+  // fp_class (nnan x), qnan|snan|other -> fp_class (nnan x), other
+  if ((Mask & fcNan) && isKnownNeverNaN(Src0, &getTargetLibraryInfo())) {
+    return replaceOperand(II, 1,
----------------
foad wrote:
> jyknight wrote:
> > These removal of Mask bits should come before the `Mask == $X` tests, shouldn't they?
> I assume that this function will be called again to revisit the modified instrinsic.
Right, it's revisited. I don't expect the bit removal part to ever actually happen

================
Comment at: llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp:856
+  // Clamp mask to used bits
+  if ((Mask & fcAllFlags) != Mask) {
+    CallInst *NewCall = Builder.CreateCall(
----------------
jyknight wrote:
> Why are unknown bits even accepted? ISTM it should be an error in Verifier::visitIntrinsicCall to pass invalid bits.
I don't know. Really this should be an i10 argument. I've been debating whether to add a verifier check, or make it an i10. I'd prefer to do that separately and clean up the bits here for now.

================
Comment at: llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp:824
+    // Equivalent of isnan. Replace with standard fcmp.
+    Value *FCmp = Builder.CreateFCmpUNO(Src0, Src0);
+    FCmp->takeName(&II);
----------------
jyknight wrote:
> arsenm wrote:
> > sepavloff wrote:
> > > Is it profitable to make such replacement early? Is there any advantage, at least hypothetical? 
> > I think an fcmp should be more canonical and better handled by existing optimizations, than class which is handled ~nowhere
> This seems OK.
> 
> I'm not sure it's the //best// choice -- if a CPU actually has an fpclassify instruction, is it really a good idea to canonicalize in generic code to fcmp? But I think that's fine to revisit later if it becomes a problem.
I've been thinking if we can do a classify in <= 2 IR instructions, fcmp+fabs is probably a better canonical form. If a class pattern is 3-4+ instructions, the class is probably better. FCmp + fneg + fabs are always going to be more broadly understood. Fcmp also supports fast math flags, unlike class (I guess we could fix that though)

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D137811/new/

https://reviews.llvm.org/D137811