[PATCH] D22675: AMDGPU: Stay in WQM for non-intrinsic stores

Tue Aug 2 10:50:03 PDT 2016

arsenm added inline comments.

================
Comment at: lib/Target/AMDGPU/SIInstrFormats.td:45
@@ -44,1 +44,3 @@
+
+  // Whether WQM _must_ be enabled for this instruction
   field bits<1> WQM = 0;
----------------
Period

================
Comment at: test/CodeGen/AMDGPU/skip-if-dead.ll:379
@@ -378,3 +378,3 @@
 bb8:                                              ; preds = %bb9, %bb4
-  store volatile i32 9, i32 addrspace(1)* undef
+  call void @llvm.amdgcn.buffer.store.f32(float 9.0, <4 x i32> undef, i32 0, i32 0, i1 0, i1 0)
   ret void
----------------
Why does this need to change? I think the point of this was just to have a volatile operation that won't be optimized in any way

================
Comment at: test/CodeGen/AMDGPU/wqm.ll:49-53
@@ -48,7 +48,7 @@
   %tex.2 = extractelement <4 x i32> %tex.1, i32 0
-  %gep = getelementptr float, float addrspace(1)* %ptr, i32 %tex.2
-  %wr = extractelement <4 x float> %tex, i32 1
-  store float %wr, float addrspace(1)* %gep
+
+  call void @llvm.amdgcn.buffer.store.v4f32(<4 x float> %tex, <4 x i32> undef, i32 %tex.2, i32 0, i1 0, i1 0)
+
   ret <4 x float> %tex
 }
 
----------------
Should there be a copy that uses the buffer.store and the regular store?

================
Comment at: test/CodeGen/AMDGPU/wqm.ll:385
@@ +384,3 @@
+; CHECK: s_and_b64 exec, exec, [[LIVE]]
+; CHECK: buffer_store_dword
+; CHECK: s_wqm_b64 exec, exec
----------------
I think these should check the full buffer for the offen to make sure this is a scratch access

================
Comment at: test/CodeGen/AMDGPU/wqm.ll:403
@@ +402,3 @@
+  %s.gep = getelementptr [32 x i32], [32 x i32]* %array, i32 0, i32 0
+  store i32 %a, i32* %s.gep, align 4
+
----------------
You should make these volatile. I'm surprised SROA isn't killing this alloca as is for you


https://reviews.llvm.org/D22675