[llvm] [AMDGPU] Prevent FMINIMUM and FMAXIMUM beeing fully scalarized (PR #91378)
Matt Arsenault via llvm-commits
llvm-commits at lists.llvm.org
Tue May 7 11:42:44 PDT 2024
================
@@ -148,23 +148,35 @@ define amdgpu_ps <2 x half> @test_fmaximum_v2f16_ss(<2 x half> inreg %a, <2 x ha
}
define amdgpu_ps <3 x half> @test_fmaximum_v3f16_vv(<3 x half> %a, <3 x half> %b) {
-; GCN-LABEL: test_fmaximum_v3f16_vv:
-; GCN: ; %bb.0:
-; GCN-NEXT: v_pk_maximum_f16 v0, v0, v2
-; GCN-NEXT: v_maximum_f16 v1, v1, v3
-; GCN-NEXT: ; return to shader part epilog
+; GFX12-SDAG-LABEL: test_fmaximum_v3f16_vv:
+; GFX12-SDAG: ; %bb.0:
+; GFX12-SDAG-NEXT: v_pk_maximum_f16 v0, v0, v2
+; GFX12-SDAG-NEXT: v_pk_maximum_f16 v1, v1, v3
----------------
arsenm wrote:
We must be missing the combine to eliminate the undef high half of the vector operation too
https://github.com/llvm/llvm-project/pull/91378
More information about the llvm-commits
mailing list