[llvm] AMDGPU: Custom lower fptrunc vectors for f32 -> f16 (PR #141883)

Thu May 29 11:38:16 PDT 2025

================
@@ -12,18 +12,124 @@ define <2 x half> @v_test_cvt_v2f32_v2f16(<2 x float> %src) {
   ret <2 x half> %res
 }
 
-define half @fptrunc_v2f32_v2f16_then_extract(<2 x float> %src) {
-; GFX950-LABEL: fptrunc_v2f32_v2f16_then_extract:
+define <4 x half> @v_test_cvt_v4f32_v4f16(<4 x float> %src) {
+; GFX950-LABEL: v_test_cvt_v4f32_v4f16:
 ; GFX950:       ; %bb.0:
 ; GFX950-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX950-NEXT:    v_cvt_pk_f16_f32 v0, v0, v1
-; GFX950-NEXT:    v_add_f16_sdwa v0, v0, v0 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; GFX950-NEXT:    v_cvt_pk_f16_f32 v1, v2, v3
+; GFX950-NEXT:    s_setpc_b64 s[30:31]
+  %res = fptrunc <4 x float> %src to <4 x half>
+  ret <4 x half> %res
+}
+
----------------
changpeng wrote:

> 3x is not marked as custom lowering in this PR

+    addRegisterClass(MVT::v3f16, &AMDGPU::SReg_64RegClass);

  Have to add the above line if we want to custom lower v3f16. I am afraid this will affect cases that do not custom lower v3f16 without making v3f16 legal. So it is better we do not custom lower v3f16 for fptrunc of v32->f16 now (which is just an optimization to generate v_cvt_pk). 
   

https://github.com/llvm/llvm-project/pull/141883