[llvm] AMDGPU: Custom lower fptrunc vectors for f32 -> f16 (PR #141883)
Changpeng Fang via llvm-commits
llvm-commits at lists.llvm.org
Thu May 29 11:38:16 PDT 2025
================
@@ -12,18 +12,124 @@ define <2 x half> @v_test_cvt_v2f32_v2f16(<2 x float> %src) {
ret <2 x half> %res
}
-define half @fptrunc_v2f32_v2f16_then_extract(<2 x float> %src) {
-; GFX950-LABEL: fptrunc_v2f32_v2f16_then_extract:
+define <4 x half> @v_test_cvt_v4f32_v4f16(<4 x float> %src) {
+; GFX950-LABEL: v_test_cvt_v4f32_v4f16:
; GFX950: ; %bb.0:
; GFX950-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX950-NEXT: v_cvt_pk_f16_f32 v0, v0, v1
-; GFX950-NEXT: v_add_f16_sdwa v0, v0, v0 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; GFX950-NEXT: v_cvt_pk_f16_f32 v1, v2, v3
+; GFX950-NEXT: s_setpc_b64 s[30:31]
+ %res = fptrunc <4 x float> %src to <4 x half>
+ ret <4 x half> %res
+}
+
----------------
changpeng wrote:
> 3x is not marked as custom lowering in this PR
+ addRegisterClass(MVT::v3f16, &AMDGPU::SReg_64RegClass);
Have to add the above line if we want to custom lower v3f16. I am afraid this will affect cases that do not custom lower v3f16 without making v3f16 legal. So it is better we do not custom lower v3f16 for fptrunc of v32->f16 now (which is just an optimization to generate v_cvt_pk).
https://github.com/llvm/llvm-project/pull/141883
More information about the llvm-commits
mailing list