[llvm] [AMDGPU] Prevent FMINIMUM and FMAXIMUM beeing fully scalarized (PR #91378)

Matt Arsenault via llvm-commits llvm-commits at lists.llvm.org
Tue May 7 11:42:44 PDT 2024


================
@@ -148,23 +148,35 @@ define amdgpu_ps <2 x half> @test_fmaximum_v2f16_ss(<2 x half> inreg %a, <2 x ha
 }
 
 define amdgpu_ps <3 x half> @test_fmaximum_v3f16_vv(<3 x half> %a, <3 x half> %b) {
-; GCN-LABEL: test_fmaximum_v3f16_vv:
-; GCN:       ; %bb.0:
-; GCN-NEXT:    v_pk_maximum_f16 v0, v0, v2
-; GCN-NEXT:    v_maximum_f16 v1, v1, v3
-; GCN-NEXT:    ; return to shader part epilog
+; GFX12-SDAG-LABEL: test_fmaximum_v3f16_vv:
+; GFX12-SDAG:       ; %bb.0:
+; GFX12-SDAG-NEXT:    v_pk_maximum_f16 v0, v0, v2
+; GFX12-SDAG-NEXT:    v_pk_maximum_f16 v1, v1, v3
----------------
arsenm wrote:

We must be missing the combine to eliminate the undef high half of the vector operation too 

https://github.com/llvm/llvm-project/pull/91378


More information about the llvm-commits mailing list