[llvm] [AMDGPU][GFX12] Default component broadcast store (PR #76212)
Jay Foad via llvm-commits
llvm-commits at lists.llvm.org
Fri Dec 22 02:19:41 PST 2023
================
@@ -23,7 +23,8 @@ define amdgpu_ps void @image_store_1d_store_insert_zeros_at_end(<8 x i32> inreg
; GCN-NEXT: ret void
;
; GFX12-LABEL: @image_store_1d_store_insert_zeros_at_end(
-; GFX12-NEXT: call void @llvm.amdgcn.image.store.1d.f32.i32(float [[VDATA1:%.*]], i32 1, i32 [[S:%.*]], <8 x i32> [[RSRC:%.*]], i32 0, i32 0)
+; GFX12-NEXT: [[NEWVDATA4:%.*]] = insertelement <4 x float> <float poison, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00>, float [[VDATA1:%.*]], i64 0
+; GFX12-NEXT: call void @llvm.amdgcn.image.store.1d.v4f32.i32(<4 x float> [[NEWVDATA4]], i32 15, i32 [[S:%.*]], <8 x i32> [[RSRC:%.*]], i32 0, i32 0)
----------------
jayfoad wrote:
Right. The behaviour changed in GFX12, so the old trailing zeros optimization no longer applies (which is what this diff shows) but the new broadcast optimization applies instead.
https://github.com/llvm/llvm-project/pull/76212
More information about the llvm-commits
mailing list