[llvm] AMDGPU: Make v2f64 -> v2f16 conversion Legal only when unsafe fast math is set (PR #134738)

Thu Apr 17 11:05:31 PDT 2025

================
@@ -2741,6 +2743,29 @@ bool AMDGPULegalizerInfo::legalizeMinNumMaxNum(LegalizerHelper &Helper,
   return Helper.lowerFMinNumMaxNum(MI) == LegalizerHelper::Legalized;
 }
 
+bool AMDGPULegalizerInfo::legalizeFPTrunc(LegalizerHelper &Helper,
+                                          MachineInstr &MI,
+                                          MachineRegisterInfo &MRI) const {
+  // TODO: We should only use fast math flag. But the global option is
+  // still used here to be consistent, especially when the fast math flag is
+  // not working for FP_ROUND on the SelectDAG path at this moment.
+  MachineFunction &MF = Helper.MIRBuilder.getMF();
+  bool AllowInaccurateFPTRUNC = MI.getFlag(MachineInstr::FmAfn) ||
+                                MF.getTarget().Options.UnsafeFPMath;
+
+  if (AllowInaccurateFPTRUNC) {
+    // Use the tablegen pattern to select native instructions.
+    return true;
+  }
+
+  Register DstReg = MI.getOperand(0).getReg();
+  LLT DstTy = MRI.getType(DstReg);
+
+  // Scalarize the vector and fall through to lower f64 -> f16.
+  return Helper.fewerElementsVector(MI, 0, DstTy.getElementType()) ==
+         LegalizerHelper::Legalized;
----------------
changpeng wrote:

> This seems like it should be a combine and not a modification of legalization rules

I am looking at instcombine to generate v_cvt_pk_f16_f32 instructions.  Should we implement in AMDGPU because it is target dependent? Thanks.

https://github.com/llvm/llvm-project/pull/134738