[PATCH] D114351: [AMDGPU] Add SIMemoryLegalizer comments to clarify bit usage

Tue Nov 23 01:32:36 PST 2021

critson added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp:1106
     case SIAtomicScope::AGENT:
+      // Request L1 MISS_EVICT.
       Changed |= enableGLCBit(MI);
----------------
t-tye wrote:
> MISS_EVICT is a policy that only applies when both SLC and GLC are set. Here only GLC is being set which specifies MISS_LRU and L2 LRU. How about:
> 
>   // Set the L1 cache policy to MISS_LRU. Note that there is no L2 cache bypass policy at the isa level.
> "The load intentionally misses the GPU L1 and reads from L2. If there was a line in the GPU L1 that matched, it is invalidated; L2 is reread." -- (CDNA1 Shader ISA, p67).
This sounds like MISS_EVICT to me.
As far as I know MISS_LRU only exists for stores and means write-combine, whereas MISS_EVICT is write-through.

================
Comment at: llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp:1465-1467
+    // Request L0 and L1 HIT_EVICT for load instructions, and L2 STREAM for
+    // load and store instructions. L0 will still be MISS_LRU for store
+    // instructions unless GLC is set elsewhere.
----------------
t-tye wrote:
> critson wrote:
> > t-tye wrote:
> > > It appears that this should be setting GLC=1 for stores so that L0 will be HIT_EVICT instead of MISS_EVICT. This must not be done for loads as that would make Lo MISS_EVICT. How about:
> > > 
> > > // For loads setting GLC to 1 sets the L0 and L1 cache policy to HIT_EVICT and the L2 cache policy to STREAM. For stores setting GLC and SLC both to 1 sets the L0 and L1 cache policy to MISS_EVICT and the L2 cache policy to STREAM.
> > Do you have MISS_EVICT and HIT_EVICT flipped in your description?
> > 
> > Do you mean:
> > // For loads setting **SLC** to 1 sets the L0 and L1 cache policy to HIT_EVICT and the L2 cache policy to STREAM. For stores setting GLC and SLC both to 1 sets the **L0 cache policy** to MISS_EVICT and the L2 cache policy to STREAM. **L1 is always bypassed for stores.**
> > 
> > I can add the GLC bit for stores and this ceases to be NFC.
> I believe I have it right according to the hardware GFX10 memory model spec.
> 
>   // For loads setting SLC to 1 sets the L0 and L1 cache policy to HIT_EVICT and the L2 cache policy to STREAM. For stores setting GLC and SLC both to 1 sets the L0 and L1 cache policy to MISS_EVICT and the L2 cache policy to STREAM.
> 
> We have to state the policy for L1 too even though the hardware documentation does not state it. The L1 MUST be evict or a subsequent load could see stale data.
> 
> Yes this changes ceases to be NFC and will need thorough testing.
Technically the L1 only has one policy described as "bypassed (but is coherent)" (RDNA1 Shader ISA p69), but on paper the behaviour of this looks the same as MISS_EVICT.  So I guess I can accept just calling it that.

Do you have any test cases which use non-temporal?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114351/new/

https://reviews.llvm.org/D114351