[Mlir-commits] [mlir] [NvGpu Dialect] add rcp approxe op (PR #100965)

Sun Jul 28 23:27:08 PDT 2024

================
@@ -802,4 +803,16 @@ def NVGPU_WarpgroupMmaInitAccumulatorOp : NVGPU_Op<"warpgroup.mma.init.accumulat
   let hasVerifier = 1;
 }
 
+def NVGPU_RcpApproxOp : NVGPU_Op<"rcp_approx", [
----------------
grypp wrote:

The `rcp.approx` is quite narrow scope for an NVGPU OP. When I look at [the PTX instruction](https://docs.nvidia.com/cuda/parallel-thread-execution/#floating-point-instructions-rcp), I see many flavors. I think one OP can cover that. What do you think?

Could we rename the operation to `nvgpu.rcp` and then add enumerators for `approx`, `rnd`, and `ftz` dynamically? It's okay if nvvm doesn't support it all, you can give an error message for the time being. 

https://github.com/llvm/llvm-project/pull/100965