[PATCH] D114351: [AMDGPU] Add SIMemoryLegalizer comments to clarify bit usage (NFC)

Mon Nov 22 21:50:32 PST 2021

critson marked 2 inline comments as done.
critson added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp:843
+  /// bypassed, and the GLC bit is instead used to indicate if they are
+  /// return or no-return.

----------------
t-tye wrote:
> Please add back:
> 
>   /// There is no bypass control for the L2 cache at the isa level.
> 
> The modified comment is only explaining the L1 cache and both caches are involved for system scope.
I deleted that text because there is a bypass for L2 stores and atomics on GFX10: SLC=0 DLC=1.
I can put it back but only contextualised for targets before GFX10?
(And the same for all the similar references in comments below.)

================
Comment at: llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp:878

     return Changed;
   }
----------------
t-tye wrote:
> critson wrote:
> > foad wrote:
> > > Not really related to your patch, but why do we return here? Doesn't that mean that IsNonTemporal is effectively ignored if IsVolatile is true? Wouldn't it be both better and simpler to fall through to the IsNonTemporal handling here?
> > This is a good point.  From a bit setting perspective it would be fine.
> > Of course there is the question of what it semantically means to have a volatile nontemporal access when we seem to define volatile as "bypasses all caches".
> I suspect this is because at one time relaxed atomics were marked as volatile. This may have been because the C/C++/OpenCL standards defined them that way, or because LLVM back then did not fully support atomics so marking them all as volatile made the existing passes "do the right thing". So this code may have been an attempt not to pessimize normal relaxed atomics. LLVM does not support non-temporal atomics currently.
> 
> I am not sure if these reasons are still the case so would be good to investigate and potentially fix this code (or at least document why it is the way it is with a FIXME). That can be a separate review I think.
Yes, a separate investigation and review.

================
Comment at: llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp:1465-1467
+    // Request L0 and L1 HIT_EVICT for load instructions, and L2 STREAM for
+    // load and store instructions. L0 will still be MISS_LRU for store
+    // instructions unless GLC is set elsewhere.
----------------
t-tye wrote:
> It appears that this should be setting GLC=1 for stores so that L0 will be HIT_EVICT instead of MISS_EVICT. This must not be done for loads as that would make Lo MISS_EVICT. How about:
> 
> // For loads setting GLC to 1 sets the L0 and L1 cache policy to HIT_EVICT and the L2 cache policy to STREAM. For stores setting GLC and SLC both to 1 sets the L0 and L1 cache policy to MISS_EVICT and the L2 cache policy to STREAM.
Do you have MISS_EVICT and HIT_EVICT flipped in your description?

Do you mean:
// For loads setting **SLC** to 1 sets the L0 and L1 cache policy to HIT_EVICT and the L2 cache policy to STREAM. For stores setting GLC and SLC both to 1 sets the **L0 cache policy** to MISS_EVICT and the L2 cache policy to STREAM. **L1 is always bypassed for stores.**

I can add the GLC bit for stores and this ceases to be NFC.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114351/new/

https://reviews.llvm.org/D114351