[llvm] [NVPTX] Add patterns for fma.relu.{f16|bf16} (PR #114977)

Tue Nov 5 10:30:56 PST 2024

================
@@ -3917,3 +3917,22 @@ def atomic_thread_fence_seq_cst_cta :
 def atomic_thread_fence_acq_rel_cta :
   NVPTXInst<(outs), (ins), "fence.acq_rel.cta;", []>,
   Requires<[hasPTX<60>, hasSM<70>]>;
+
+def fpimm0 : FPImmLeaf<fAny, [{
+  return Imm.isExactlyValue(+0.0);
+}]>;
+
+def FMARELU_F16 :
+  NVPTXInst<(outs Int16Regs:$dst), (ins Int16Regs:$a, Int16Regs:$b, Int16Regs:$c),
+            "fma.rn.relu.f16 \t$dst, $a, $b, $c;", []>;
----------------
Artem-B wrote:

I think applying constraint to the instruction itself is the right thing to do. We do not want them to be emitted unintentionally, even if we do not do it now.

I do not know whether the constraint propagates to the pattern, but I think it may, so applying it here should do the job. It's easy enough to test by running the tests while targeting an older GPU.

https://github.com/llvm/llvm-project/pull/114977