[PATCH] D22675: AMDGPU: Stay in WQM for non-intrinsic stores
Matt Arsenault via llvm-commits
llvm-commits at lists.llvm.org
Tue Aug 2 10:50:03 PDT 2016
arsenm added inline comments.
================
Comment at: lib/Target/AMDGPU/SIInstrFormats.td:45
@@ -44,1 +44,3 @@
+
+ // Whether WQM _must_ be enabled for this instruction
field bits<1> WQM = 0;
----------------
Period
================
Comment at: test/CodeGen/AMDGPU/skip-if-dead.ll:379
@@ -378,3 +378,3 @@
bb8: ; preds = %bb9, %bb4
- store volatile i32 9, i32 addrspace(1)* undef
+ call void @llvm.amdgcn.buffer.store.f32(float 9.0, <4 x i32> undef, i32 0, i32 0, i1 0, i1 0)
ret void
----------------
Why does this need to change? I think the point of this was just to have a volatile operation that won't be optimized in any way
================
Comment at: test/CodeGen/AMDGPU/wqm.ll:49-53
@@ -48,7 +48,7 @@
%tex.2 = extractelement <4 x i32> %tex.1, i32 0
- %gep = getelementptr float, float addrspace(1)* %ptr, i32 %tex.2
- %wr = extractelement <4 x float> %tex, i32 1
- store float %wr, float addrspace(1)* %gep
+
+ call void @llvm.amdgcn.buffer.store.v4f32(<4 x float> %tex, <4 x i32> undef, i32 %tex.2, i32 0, i1 0, i1 0)
+
ret <4 x float> %tex
}
----------------
Should there be a copy that uses the buffer.store and the regular store?
================
Comment at: test/CodeGen/AMDGPU/wqm.ll:385
@@ +384,3 @@
+; CHECK: s_and_b64 exec, exec, [[LIVE]]
+; CHECK: buffer_store_dword
+; CHECK: s_wqm_b64 exec, exec
----------------
I think these should check the full buffer for the offen to make sure this is a scratch access
================
Comment at: test/CodeGen/AMDGPU/wqm.ll:403
@@ +402,3 @@
+ %s.gep = getelementptr [32 x i32], [32 x i32]* %array, i32 0, i32 0
+ store i32 %a, i32* %s.gep, align 4
+
----------------
You should make these volatile. I'm surprised SROA isn't killing this alloca as is for you
https://reviews.llvm.org/D22675
More information about the llvm-commits
mailing list