[llvm] AMDGPU: Custom lower fptrunc vectors for f32 -> f16 (PR #141883)

Thu May 29 00:43:19 PDT 2025

================
@@ -6900,14 +6902,44 @@ SDValue SITargetLowering::getFPExtOrFPRound(SelectionDAG &DAG, SDValue Op,
                            DAG.getTargetConstant(0, DL, MVT::i32));
 }
 
+SDValue SITargetLowering::SplitFP_ROUNDVectorToPacks(SDValue Op,
----------------
changpeng wrote:

> This would be shorter if it followed splitUnaryVectorOp, except this one needs to carry forward the rounding flag operand
splitUnaryVectorOp was designed to exclude FP cast because it assume Src and Dst have the same types. Rounding flag is another exception. 
'The initial intention of this function is to handle arbitrary number of elements.  I gave up the cases with odd number of elements later because they are not legal and can not be customized. If we want to limit further to power of 2, we may get similar to splitUnaryVectorOp, but I am not fully sure whether the recursive call is the best approach.

https://github.com/llvm/llvm-project/pull/141883