[PATCH] D140846: [NVPTX] Fix NVPTX lowering of frem when denominator is infinite.

Artem Belevich via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Jan 3 11:38:06 PST 2023


tra added inline comments.


================
Comment at: llvm/lib/Target/NVPTX/NVPTXInstrInfo.td:604
+
+// FIXME: Missing slct
+
----------------
Nit: It's more of a TODO, IMO. :-)

I wonder if the instruction actually provides any benefit over `cmp`+`selp` on the SASS level. I suspect that it probably does not, and implementing it would just give us a bit nicer PTX w/o much of an effect on the actual GPU code.


================
Comment at: llvm/lib/Target/NVPTX/NVPTXInstrInfo.td:1285
 def : Pat<(frem Float64Regs:$x, Float64Regs:$y),
-          (FSUBf64rr Float64Regs:$x, (FMULf64rr (CVT_f64_f64
-            (FDIV64rr Float64Regs:$x, Float64Regs:$y), CvtRZI),
-             Float64Regs:$y))>;
+          (SELP_f64rr Float64Regs:$x,
+            (FSUBf64rr Float64Regs:$x, (FMULf64rr (CVT_f64_f64
----------------
This would add selp+testinf unconditionally to all `frem` lowerings. While it is correct, I wonder if we may want to avoid that when we're in fast-math mode when we only care about finite math.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140846/new/

https://reviews.llvm.org/D140846



More information about the llvm-commits mailing list