[PATCH] D140846: [NVPTX] Fix NVPTX lowering of frem when denominator is infinite.

Tue Jan 3 11:38:06 PST 2023

tra added inline comments.

================
Comment at: llvm/lib/Target/NVPTX/NVPTXInstrInfo.td:604
+
+// FIXME: Missing slct
+
----------------
Nit: It's more of a TODO, IMO. :-)

I wonder if the instruction actually provides any benefit over `cmp`+`selp` on the SASS level. I suspect that it probably does not, and implementing it would just give us a bit nicer PTX w/o much of an effect on the actual GPU code.

================
Comment at: llvm/lib/Target/NVPTX/NVPTXInstrInfo.td:1285
 def : Pat<(frem Float64Regs:$x, Float64Regs:$y),
-          (FSUBf64rr Float64Regs:$x, (FMULf64rr (CVT_f64_f64
-            (FDIV64rr Float64Regs:$x, Float64Regs:$y), CvtRZI),
-             Float64Regs:$y))>;
+          (SELP_f64rr Float64Regs:$x,
+            (FSUBf64rr Float64Regs:$x, (FMULf64rr (CVT_f64_f64
----------------
This would add selp+testinf unconditionally to all `frem` lowerings. While it is correct, I wonder if we may want to avoid that when we're in fast-math mode when we only care about finite math.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140846/new/

https://reviews.llvm.org/D140846