[flang-commits] [flang] [flang] Use saturated intrinsic for floating point conversions (PR #130686)

Tue Mar 11 12:25:27 PDT 2025

================
@@ -835,10 +835,20 @@ struct ConvertOpConversion : public fir::FIROpConversion<fir::ConvertOp> {
         return mlir::success();
       }
       if (mlir::isa<mlir::IntegerType>(toTy)) {
-        if (toTy.isUnsignedInteger())
-          rewriter.replaceOpWithNewOp<mlir::LLVM::FPToUIOp>(convert, toTy, op0);
-        else
-          rewriter.replaceOpWithNewOp<mlir::LLVM::FPToSIOp>(convert, toTy, op0);
+        // NOTE: We are checking the fir type here because toTy is an LLVM type
+        // which is signless, and we need to use the intrinsic that matches the
+        // sign of the output in fir.
+        if (toFirTy.isUnsignedInteger()) {
+          auto intrinsicName =
+              mlir::StringAttr::get(convert.getContext(), "llvm.fptoui.sat");
+          rewriter.replaceOpWithNewOp<mlir::LLVM::CallIntrinsicOp>(
+              convert, toTy, intrinsicName, op0);
+        } else {
+          auto intrinsicName =
+              mlir::StringAttr::get(convert.getContext(), "llvm.fptosi.sat");
+          rewriter.replaceOpWithNewOp<mlir::LLVM::CallIntrinsicOp>(
+              convert, toTy, intrinsicName, op0);
+        }
----------------
ashermancinelli wrote:

They produce more instructions on x86 (when they cannot be const-folded away) ([x86 godbolt link, more instructions](https://godbolt.org/z/z6KKf8Yao), [aarch64 godbolt link, both using `fcvtzs`](https://godbolt.org/z/7coacsjPd)), and if someone converted reals to integers in a hot loop they might see worse performance, however I was unable to find a difference in the performance tests that I ran. I'll be watching performance numbers after this is merged in case something comes up. 

> Would it be possible to use the saturation intrinsic only when necessary?

As long as we want the correct semantics for values only known at runtime, I don't think so. However, especially if performance issues come up, I think it would make sense to use the fptosi/fptoui instructions under some flag, maybe enabled by default above some optimization level. Do you think using the instructions instead of the saturated intrinsics under (for example) `-ffast-math` would be a good compromise if performance issues show up? 

https://github.com/llvm/llvm-project/pull/130686