[PATCH] D24623: AMDGPU: Implement memory model

Wed Jul 12 16:14:47 PDT 2017

kzhuravl added inline comments.

================
Comment at: lib/Target/AMDGPU/SIInstructions.td:515

+def ATOMIC_FENCE : InstSI<
+  (outs), (ins i32imm:$ordering, i32imm:$scope),
----------------
arsenm wrote:
> There's an SPseduoInst or something like that which will avoid needing to set any of these bits
Already committed separately with comments taken care of.

================
Comment at: lib/Target/AMDGPU/SIInstructions.td:519
+  [(atomic_fence (i32 imm:$ordering), (i32 imm:$scope))]> {
+  let hasSideEffects = 1;
+  let isCodeGenOnly = 1;
----------------
arsenm wrote:
> I don't think this is necessary, only mayLoad and mayStore
Already committed separately with comments taken care of.

================
Comment at: lib/Target/AMDGPU/SIMemoryLegalizer.cpp:194-195
+    const MachineBasicBlock::iterator &MI) const {
+  if (!MI->hasOneMemOperand())
+    return false;
+
----------------
arsenm wrote:
> If there are no memory operands, this should still work and be handled as conservatively as possible
Do you see an issue with this approach?

================
Comment at: lib/Target/AMDGPU/SIMemoryLegalizer.cpp:277
+      ++MI;
+      Changed |= InsertWaitcntVmcnt0(MI);
+      Changed |= InsertBufferWbinvl1Vol(MI);
----------------
t-tye wrote:
> Is this needed only if the the instruction is a VMEM or FLAT instruction, not if a DS? It is only ensuring that the load has completed before doing the VMEM invalidate.
As discussed, this will be done in separate change.

================
Comment at: lib/Target/AMDGPU/SIMemoryLegalizer.cpp:347
+  case AMDGPUSynchronizationScope::Agent: {
+    Changed |= SetGLC(MI);
+
----------------
t-tye wrote:
> Should this be done? For rmw the glc bit controls whether the original value is returned, not whether the L1 cache is bypassed.
As discussed, this will be done in separate change.

================
Comment at: lib/Target/AMDGPU/SIMemoryLegalizer.cpp:361
+      ++MI;
+      Changed |= InsertWaitcntVmcnt0(MI);
+      Changed |= InsertBufferWbinvl1Vol(MI);
----------------
t-tye wrote:
> Is this required if a DS memory operation? Seems it is only required if a VMEM or FLAT instruction to ensure it has completed before invalidating the cache.
As discussed, this will be done in separate change.

================
Comment at: lib/Target/AMDGPU/SIMemoryLegalizer.cpp:397
+  case AMDGPUSynchronizationScope::Agent: {
+    Changed |= SetGLC(MI);
+
----------------
t-tye wrote:
> Ditto.
As discussed, this will be done in separate change.

================
Comment at: lib/Target/AMDGPU/SIMemoryLegalizer.cpp:408
+      ++MI;
+      Changed |= InsertWaitcntVmcnt0(MI);
+      Changed |= InsertBufferWbinvl1Vol(MI);
----------------
t-tye wrote:
> Is this required if a DS memory operation? Seems it is only required if a VMEM or FLAT instruction to ensure it has completed before invalidating the cache.
As discussed, this will be done in separate change.

================
Comment at: lib/Target/AMDGPU/SIMemoryLegalizer.cpp:419
+  case AMDGPUSynchronizationScope::SignalHandler: {
+    Changed |= SetGLC(MI);
+    break;
----------------
t-tye wrote:
> Ditto.
As discussed, this will be done in separate change.

================
Comment at: test/CodeGen/AMDGPU/global_atomics.ll:1026
 ; FUNC-LABEL: {{^}}atomic_store_i32_addr64_offset:
-; SI: buffer_store_dword {{v[0-9]+}}, v[{{[0-9]+}}:{{[0-9]+}}], s[{{[0-9]+}}:{{[0-9]+}}], 0 addr64 offset:16 glc{{$}}
-; VI: flat_store_dword v[{{[0-9]+}}:{{[0-9]+}}], {{v[0-9]+}} glc{{$}}
+; SI: buffer_store_dword {{v[0-9]+}}, v[{{[0-9]+}}:{{[0-9]+}}], s[{{[0-9]+}}:{{[0-9]+}}], 0 addr64 offset:16{{$}}
+; VI: flat_store_dword v[{{[0-9]+}}:{{[0-9]+}}], {{v[0-9]+}}{{$}}
----------------
arsenm wrote:
> Why does this lose the glc bit?
I think it does not need to be set in this case.

https://reviews.llvm.org/D24623