[llvm] [AMDGPU][GFX12] Default component broadcast store (PR #76212)

Jay Foad via llvm-commits llvm-commits at lists.llvm.org
Fri Dec 22 02:19:41 PST 2023


================
@@ -23,7 +23,8 @@ define amdgpu_ps void @image_store_1d_store_insert_zeros_at_end(<8 x i32> inreg
 ; GCN-NEXT:    ret void
 ;
 ; GFX12-LABEL: @image_store_1d_store_insert_zeros_at_end(
-; GFX12-NEXT:    call void @llvm.amdgcn.image.store.1d.f32.i32(float [[VDATA1:%.*]], i32 1, i32 [[S:%.*]], <8 x i32> [[RSRC:%.*]], i32 0, i32 0)
+; GFX12-NEXT:    [[NEWVDATA4:%.*]] = insertelement <4 x float> <float poison, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00>, float [[VDATA1:%.*]], i64 0
+; GFX12-NEXT:    call void @llvm.amdgcn.image.store.1d.v4f32.i32(<4 x float> [[NEWVDATA4]], i32 15, i32 [[S:%.*]], <8 x i32> [[RSRC:%.*]], i32 0, i32 0)
----------------
jayfoad wrote:

Right. The behaviour changed in GFX12, so the old trailing zeros optimization no longer applies (which is what this diff shows) but the new broadcast optimization applies instead.

https://github.com/llvm/llvm-project/pull/76212


More information about the llvm-commits mailing list