[PATCH] D99128: [AMDGPU] Removed unnecessary cache invalidations.

Wed Mar 24 22:18:44 PDT 2021

t-tye added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:725
+  case SMEM_ACCESS:
+  case GDS_ACCESS:
+    setPotentiallyDirtyCacheAtLevel(MEM_CACHE_LVL_0);
----------------
s-perron wrote:
> t-tye wrote:
> > Is GDS needed here? Seems this is only dealing with VMEM. GDS is like LDS and not part of VMEM.
> > 
> > Should SMEM writes be considered? Some targets support that.
> > Is GDS needed here? 
> 
> Does a load from GDS go through the L1 cache?  I was under the impression that it does, but I'm not all that familiar with the architecture.  If it does not then it is not needed.
> 
> 
> > Should SMEM writes be considered? 
> 
> The SMEM_ACCESS event covers both reads and writes to SMEM, so this should already be covered.  See the comment by the definition of the enum.
> 
> > Is GDS needed here? 
> 
> Does a load from GDS go through the L1 cache?  I was under the impression that it does, but I'm not all that familiar with the architecture.  If it does not then it is not needed.

No it does not.

> > Should SMEM writes be considered? 
> 
> The SMEM_ACCESS event covers both reads and writes to SMEM, so this should already be covered.  See the comment by the definition of the enum.
> 

Still seems SMEM reads and writes should be separate.

================
Comment at: llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:732
+  case EXP_PARAM_ACCESS:
+    setPotentiallyDirtyCacheAtLevel(MEM_CACHE_LVL_0);
+    break;
----------------
s-perron wrote:
> t-tye wrote:
> > LDS has no cache control so should if be considered?
> > 
> > Why only level 0 as loading into the L0 can also cause data to be loaded into the other caches.
> I guess I misunderstood how the arch is designed.  Would a read from LDS load data into the L1 cache?  I was told "Another subtlety is that on gfx10 in WGP mode, the L0 cache needs to be invalidated even for LDS."  I'm not being precise enough here, I should be checking which mode we are in and possibly which GFX version.
> 
> I also want to ask about the export instructions.  I made a conservative guess that they would require the L0 cache invalidation, but I am not sure how the output buffer is implemented.
> I guess I misunderstood how the arch is designed.  Would a read from LDS load data into the L1 cache?  I was told "Another subtlety is that on gfx10 in WGP mode, the L0 cache needs to be invalidated even for LDS."  I'm not being precise enough here, I should be checking which mode we are in and possibly which GFX version.
> 
> I also want to ask about the export instructions.  I made a conservative guess that they would require the L0 cache invalidation, but I am not sure how the output buffer is implemented.

It would probably be easier to just talk through how the architecture works so you can then make updates.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99128/new/

https://reviews.llvm.org/D99128