[libcxx] [flang] [llvm] [libc] [compiler-rt] [clang-tools-extra] [clang] [lld] [lldb] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

Stanislav Mekhanoshin via cfe-commits cfe-commits at lists.llvm.org
Mon Jan 15 12:32:15 PST 2024


================
@@ -1183,9 +1228,21 @@ bool SIInsertWaitcnts::generateWaitcntInstBefore(MachineInstr &MI,
         // No need to wait before load from VMEM to LDS.
         if (TII->mayWriteLDSThroughDMA(MI))
           continue;
-        unsigned RegNo = SQ_MAX_PGM_VGPRS + EXTRA_VGPR_LDS;
+
         // VM_CNT is only relevant to vgpr or LDS.
-        ScoreBrackets.determineWait(VM_CNT, RegNo, Wait);
+        unsigned RegNo = SQ_MAX_PGM_VGPRS + EXTRA_VGPR_LDS;
+        bool FoundAliasingStore = false;
+        if (Ptr && Memop->getAAInfo() && Memop->getAAInfo().Scope) {
----------------
rampitec wrote:

I have added more comments to explain this. The place which fills the LDS DMA slot bails if there is no scope info not to waste limited tracking slots. In that case a generic first slot is still used for such operation (it is always used, regardless if we can or cannot be more specific about the underlying object). Here AA will be unable to disambiguate aliasing if there is no scope info, so this condition is simply a shortcut to avoid an expensive loop and AA query. I can remove this part of the condition here and nothing will change except it will work slower. Note that not entering this 'if' statement will always produce a conservatively correct wait using first generic tracking slot, which always gets a score regardless of our ability to track a specific object. The condition is around the relaxation code to avoid a generic and conservative 'wait for everything' part below.

https://github.com/llvm/llvm-project/pull/74537


More information about the cfe-commits mailing list