[PATCH] D99128: [AMDGPU] Removed unnecessary cache invalidations.

Tue Mar 23 13:57:38 PDT 2021

t-tye requested changes to this revision.
t-tye added inline comments.
This revision now requires changes to proceed.

================
Comment at: llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:92
+  MEM_CACHE_LVL_0 = MEM_CACHE_LVL_BEGIN,
+  MEM_CACHE_LVL_1,
+  MEM_CACHE_LVL_END
----------------
gfx90a also has L2 cache control. See https://llvm.org/docs/AMDGPUUsage.html#memory-model-gfx90a .

================
Comment at: llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:198-212
+bool instructionInvalidatesL1Cache(unsigned OpCode) {
+  switch (OpCode) {
+  case AMDGPU::BUFFER_WBINVL1:
+  case AMDGPU::BUFFER_WBINVL1_SC:
+  case AMDGPU::BUFFER_WBINVL1_VOL:
+  case AMDGPU::BUFFER_GL1_INV:
+    return true;
----------------
Should this be a query on an instruction as the glc, et al bit can control the caches affected? There is also buffer_invl2. Should this be driven by a table gen property?

Should cache writeback also be tracked? GFX90A has L2 writeback controls.

================
Comment at: llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:499
                             WaitcntBrackets &ScoreBrackets);
+  void clearDirtyCacheLevelsIfIsCacheInvalidationInstruction(
+      MachineInstr &MI, WaitcntBrackets &Brackets);
----------------
Invalidating is about removing stale data. Writeback is about removing dirty data. So suggest renaming these operations.

================
Comment at: llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:502
+
+  bool removeUnnecessaryMemoryCacheInvalidationInstructions();
+  bool removeIfUnnecessaryMemoryCacheInvalidationInBlock(
----------------
Also have similar for removing unnecessary writeback instructions.

================
Comment at: llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:725
+  case SMEM_ACCESS:
+  case GDS_ACCESS:
+    setPotentiallyDirtyCacheAtLevel(MEM_CACHE_LVL_0);
----------------
Is GDS needed here? Seems this is only dealing with VMEM. GDS is like LDS and not part of VMEM.

Should SMEM writes be considered? Some targets support that.

================
Comment at: llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:726-727
+  case GDS_ACCESS:
+    setPotentiallyDirtyCacheAtLevel(MEM_CACHE_LVL_0);
+    setPotentiallyDirtyCacheAtLevel(MEM_CACHE_LVL_1);
+    break;
----------------
Should the cache bypass of the instructions be considered? When the cache is bypassed the data is not left in the cache and so does not cause the need for an invalidate. But the cache bypass can specify which caches it is bypassing. For example, performing a series of relaxed atomics at system scope would require no invalidates.

Should the impact of writes be considered in deciding on the need for a cache writeback? Again there can be cache bypass, and some caches are readonly or write-through.

================
Comment at: llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:732
+  case EXP_PARAM_ACCESS:
+    setPotentiallyDirtyCacheAtLevel(MEM_CACHE_LVL_0);
+    break;
----------------
LDS has no cache control so should if be considered?

Why only level 0 as loading into the L0 can also cause data to be loaded into the other caches.

================
Comment at: llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:794
+  for (MemoryCacheLevel Level : mem_cache_levels()) {
+    OS << "Has potentially dirty L" << Level
+       << " cache: " << hasPotentiallyDirtyCacheAtLevel(Level) << '\n';
----------------
Seems this is tracking possible stale data, not dirty data.

================
Comment at: llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:938
+      instructionInvalidatesL1Cache(MI.getOpcode())) {
     Wait.VmCnt = 0;
   }
----------------
Is this correct? These instructions do not ensure the waicnt is 0. This is also missing GFX90A cache instructions.

================
Comment at: llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1732

-      if (Brackets->hasPending()) {
+      if (Brackets->hasPending() || Brackets->hasPotentiallyDirtyCache()) {
         BlockInfo *MoveBracketsToSucc = nullptr;
----------------
Should hasPending be generalized to include all pending "things"? Seems merge has been generalized to merge all the "things".

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99128/new/

https://reviews.llvm.org/D99128