[llvm] AMDGPU: Custom lower fptrunc vectors for f32 -> f16 (PR #141883)
Matt Arsenault via llvm-commits
llvm-commits at lists.llvm.org
Thu May 29 12:55:09 PDT 2025
================
@@ -12,18 +12,124 @@ define <2 x half> @v_test_cvt_v2f32_v2f16(<2 x float> %src) {
ret <2 x half> %res
}
-define half @fptrunc_v2f32_v2f16_then_extract(<2 x float> %src) {
-; GFX950-LABEL: fptrunc_v2f32_v2f16_then_extract:
+define <4 x half> @v_test_cvt_v4f32_v4f16(<4 x float> %src) {
+; GFX950-LABEL: v_test_cvt_v4f32_v4f16:
; GFX950: ; %bb.0:
; GFX950-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX950-NEXT: v_cvt_pk_f16_f32 v0, v0, v1
-; GFX950-NEXT: v_add_f16_sdwa v0, v0, v0 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; GFX950-NEXT: v_cvt_pk_f16_f32 v1, v2, v3
+; GFX950-NEXT: s_setpc_b64 s[30:31]
+ %res = fptrunc <4 x float> %src to <4 x half>
+ ret <4 x half> %res
+}
+
----------------
arsenm wrote:
It doesn't matter if it's added as custom lowered or not in this PR, it should be tested. You probably do not need to do anything to handle it correctly, but it needs the test.
You also do not need to make the type legal to custom lower, custom lowering on illegal types calls ReplaceNodeResults
https://github.com/llvm/llvm-project/pull/141883
More information about the llvm-commits
mailing list