[llvm] AMDGPU: Make v2f32 -> v2f16 legal when target supports v_cvt_pk_f16_f32 (PR #139956)

Wed May 14 14:55:09 PDT 2025

================
@@ -6899,10 +6902,16 @@ SDValue SITargetLowering::getFPExtOrFPRound(SelectionDAG &DAG, SDValue Op,
 SDValue SITargetLowering::lowerFP_ROUND(SDValue Op, SelectionDAG &DAG) const {
   SDValue Src = Op.getOperand(0);
   EVT SrcVT = Src.getValueType();
+  EVT DstVT = Op.getValueType();
+
+  if (DstVT == MVT::v2f16) {
+    assert(Subtarget->hasCvtPkF16F32Inst() && "support v_cvt_pk_f16_f32");
----------------
changpeng wrote:

> The same should apply to all vector types.

I don't think so. You can not make v3f32 and v4f32 legal because there is no corresponding instructions. Instead, we have to "Expand" (split) for wider vector types. It is another subject how to split the wider vector to take advantage of the packed conversion.

https://github.com/llvm/llvm-project/pull/139956