[llvm] [NVPTX] Add float to tf32 conversion intrinsic (PR #121507)

Sun Jan 5 19:22:32 PST 2025

================
@@ -1466,6 +1466,15 @@ let TargetPrefix = "nvvm" in {
   def int_nvvm_e5m2x2_to_f16x2_rn_relu : ClangBuiltin<"__nvvm_e5m2x2_to_f16x2_rn_relu">,
       Intrinsic<[llvm_v2f16_ty], [llvm_i16_ty], [IntrNoMem, IntrNoCallback]>;
 
+// Convert Float to TF32
+def int_nvvm_cvt_float_to_tf32 : Intrinsic<[llvm_i32_ty],
+    [llvm_float_ty, // Input float
+     llvm_i8_ty,    // Flag for Rounding Modes
----------------
andykaylor wrote:

Yeah, there are a few patches in progress now to convert things to use operand bundles. The motivation there is to be able to attach constrained behavior to any intrinsic without requiring a constrained-specific form of the intrinsic.

The use of rounding mode here is a bit different than in the constrained intrinsics. In the constrained intrinsics, the rounding mode is meant to indicate the rounding mode that may be assumed. While it is permissable to lower the constrained intrinsics to instructions that directly encode the rounding mode, that's not required. Whereas here, it seems that the rounding mode is meant to be more active such that it is required to be encoded in the instruction.

In several other target-specific intrinsics that include an explicit rounding mode, the rounding mode is specified as an immediate argument, but I think that's usually because it corresponds to a value in the target instruction encoding. Since you need a rounding more printer that performs a translation here, I don't think there's any advantage to using an immediate, which would be a bit of a magic number in the IR.

https://github.com/llvm/llvm-project/pull/121507