[PATCH] D114351: [AMDGPU] Add SIMemoryLegalizer comments to clarify bit usage

Tue Nov 23 13:52:23 PST 2021

t-tye added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp:1106
     case SIAtomicScope::AGENT:
+      // Request L1 MISS_EVICT.
       Changed |= enableGLCBit(MI);
----------------
critson wrote:
> t-tye wrote:
> > MISS_EVICT is a policy that only applies when both SLC and GLC are set. Here only GLC is being set which specifies MISS_LRU and L2 LRU. How about:
> > 
> >   // Set the L1 cache policy to MISS_LRU. Note that there is no L2 cache bypass policy at the isa level.
> > "The load intentionally misses the GPU L1 and reads from L2. If there was a line in the GPU L1 that matched, it is invalidated; L2 is reread." -- (CDNA1 Shader ISA, p67).
> This sounds like MISS_EVICT to me.
> As far as I know MISS_LRU only exists for stores and means write-combine, whereas MISS_EVICT is write-through.
GFX90A cache policies are different to other GFX9 and GFX10. If GLC=1 then the L1 policy is MISS_LRU for loads: any existing line is invalidated, then the cache line is loaded and remains in the cache with LRU policy.

================
Comment at: llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp:1465-1467
+    // Request L0 and L1 HIT_EVICT for load instructions, and L2 STREAM for
+    // load and store instructions. L0 will still be MISS_LRU for store
+    // instructions unless GLC is set elsewhere.
----------------
critson wrote:
> t-tye wrote:
> > critson wrote:
> > > t-tye wrote:
> > > > It appears that this should be setting GLC=1 for stores so that L0 will be HIT_EVICT instead of MISS_EVICT. This must not be done for loads as that would make Lo MISS_EVICT. How about:
> > > > 
> > > > // For loads setting GLC to 1 sets the L0 and L1 cache policy to HIT_EVICT and the L2 cache policy to STREAM. For stores setting GLC and SLC both to 1 sets the L0 and L1 cache policy to MISS_EVICT and the L2 cache policy to STREAM.
> > > Do you have MISS_EVICT and HIT_EVICT flipped in your description?
> > > 
> > > Do you mean:
> > > // For loads setting **SLC** to 1 sets the L0 and L1 cache policy to HIT_EVICT and the L2 cache policy to STREAM. For stores setting GLC and SLC both to 1 sets the **L0 cache policy** to MISS_EVICT and the L2 cache policy to STREAM. **L1 is always bypassed for stores.**
> > > 
> > > I can add the GLC bit for stores and this ceases to be NFC.
> > I believe I have it right according to the hardware GFX10 memory model spec.
> > 
> >   // For loads setting SLC to 1 sets the L0 and L1 cache policy to HIT_EVICT and the L2 cache policy to STREAM. For stores setting GLC and SLC both to 1 sets the L0 and L1 cache policy to MISS_EVICT and the L2 cache policy to STREAM.
> > 
> > We have to state the policy for L1 too even though the hardware documentation does not state it. The L1 MUST be evict or a subsequent load could see stale data.
> > 
> > Yes this changes ceases to be NFC and will need thorough testing.
> Technically the L1 only has one policy described as "bypassed (but is coherent)" (RDNA1 Shader ISA p69), but on paper the behaviour of this looks the same as MISS_EVICT.  So I guess I can accept just calling it that.
> 
> Do you have any test cases which use non-temporal?
In the GFX10 memory model spec I have, it does not state the L1 policy for stores, but my understanding is that it behaves as MISS_EVICT. To me "bypass but coherent" is semantically the same thing as MISS_EVICT and it seems odd not to use the terminology that is already well defined.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114351/new/

https://reviews.llvm.org/D114351