[llvm] [NVPTX] Add patterns for fma.relu.{f16|bf16} (PR #114977)
Artem Belevich via llvm-commits
llvm-commits at lists.llvm.org
Wed Nov 6 10:23:40 PST 2024
================
@@ -3917,3 +3917,40 @@ def atomic_thread_fence_seq_cst_cta :
def atomic_thread_fence_acq_rel_cta :
NVPTXInst<(outs), (ins), "fence.acq_rel.cta;", []>,
Requires<[hasPTX<60>, hasSM<70>]>;
+
+def fpimm0 : FPImmLeaf<fAny, [{
+ return Imm.isExactlyValue(+0.0);
+}]>;
+
+def FMARELU_F16 :
+ NVPTXInst<(outs Int16Regs:$dst), (ins Int16Regs:$a, Int16Regs:$b, Int16Regs:$c),
+ "fma.rn.relu.f16 \t$dst, $a, $b, $c;", []>,
+ Requires<[useFP16Math, hasPTX<70>, hasSM<80>]>;
+def FMARELU_BF16 :
+ NVPTXInst<(outs Int16Regs:$dst), (ins Int16Regs:$a, Int16Regs:$b, Int16Regs:$c),
+ "fma.rn.relu.bf16 \t$dst, $a, $b, $c;", []>,
+ Requires<[hasBF16Math, hasPTX<70>, hasSM<80>]>;
+def FMARELU_F16_FTZ :
+ NVPTXInst<(outs Int16Regs:$dst), (ins Int16Regs:$a, Int16Regs:$b, Int16Regs:$c),
+ "fma.rn.relu.ftz.f16 \t$dst, $a, $b, $c;", []>,
----------------
Artem-B wrote:
The `relu` and `ftz` parts of the instructions appear to be in the order different from that in the PTX manual:
```
Syntax
fma.rnd{.ftz}.relu.f16 d, a, b, c;
fma.rnd{.ftz}.relu.f16x2 d, a, b, c;
```
Does ptxas accept generated instruction? In any case to avoid unnecessary divergence vs. the manual, it may be worth it to match the official syntax, even if ptxas happens to accept `fma.rn.relu.ftz.f16`
https://github.com/llvm/llvm-project/pull/114977
More information about the llvm-commits
mailing list