[llvm] AMDGPU: Custom lower fptrunc vectors for f32 -> f16 (PR #141883)
Shilei Tian via llvm-commits
llvm-commits at lists.llvm.org
Thu May 29 06:48:47 PDT 2025
================
@@ -12,18 +12,124 @@ define <2 x half> @v_test_cvt_v2f32_v2f16(<2 x float> %src) {
ret <2 x half> %res
}
-define half @fptrunc_v2f32_v2f16_then_extract(<2 x float> %src) {
-; GFX950-LABEL: fptrunc_v2f32_v2f16_then_extract:
+define <4 x half> @v_test_cvt_v4f32_v4f16(<4 x float> %src) {
+; GFX950-LABEL: v_test_cvt_v4f32_v4f16:
; GFX950: ; %bb.0:
; GFX950-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX950-NEXT: v_cvt_pk_f16_f32 v0, v0, v1
-; GFX950-NEXT: v_add_f16_sdwa v0, v0, v0 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; GFX950-NEXT: v_cvt_pk_f16_f32 v1, v2, v3
+; GFX950-NEXT: s_setpc_b64 s[30:31]
+ %res = fptrunc <4 x float> %src to <4 x half>
+ ret <4 x half> %res
+}
+
----------------
shiltian wrote:
3x is not marked as custom lowering in this PR
https://github.com/llvm/llvm-project/pull/141883
More information about the llvm-commits
mailing list