[PATCH] D99128: [AMDGPU] Removed unnecessary cache invalidations.
Tony Tye via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Mar 24 22:18:44 PDT 2021
t-tye added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:725
+ case SMEM_ACCESS:
+ case GDS_ACCESS:
+ setPotentiallyDirtyCacheAtLevel(MEM_CACHE_LVL_0);
----------------
s-perron wrote:
> t-tye wrote:
> > Is GDS needed here? Seems this is only dealing with VMEM. GDS is like LDS and not part of VMEM.
> >
> > Should SMEM writes be considered? Some targets support that.
> > Is GDS needed here?
>
> Does a load from GDS go through the L1 cache? I was under the impression that it does, but I'm not all that familiar with the architecture. If it does not then it is not needed.
>
>
> > Should SMEM writes be considered?
>
> The SMEM_ACCESS event covers both reads and writes to SMEM, so this should already be covered. See the comment by the definition of the enum.
>
> > Is GDS needed here?
>
> Does a load from GDS go through the L1 cache? I was under the impression that it does, but I'm not all that familiar with the architecture. If it does not then it is not needed.
No it does not.
> > Should SMEM writes be considered?
>
> The SMEM_ACCESS event covers both reads and writes to SMEM, so this should already be covered. See the comment by the definition of the enum.
>
Still seems SMEM reads and writes should be separate.
================
Comment at: llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:732
+ case EXP_PARAM_ACCESS:
+ setPotentiallyDirtyCacheAtLevel(MEM_CACHE_LVL_0);
+ break;
----------------
s-perron wrote:
> t-tye wrote:
> > LDS has no cache control so should if be considered?
> >
> > Why only level 0 as loading into the L0 can also cause data to be loaded into the other caches.
> I guess I misunderstood how the arch is designed. Would a read from LDS load data into the L1 cache? I was told "Another subtlety is that on gfx10 in WGP mode, the L0 cache needs to be invalidated even for LDS." I'm not being precise enough here, I should be checking which mode we are in and possibly which GFX version.
>
> I also want to ask about the export instructions. I made a conservative guess that they would require the L0 cache invalidation, but I am not sure how the output buffer is implemented.
> I guess I misunderstood how the arch is designed. Would a read from LDS load data into the L1 cache? I was told "Another subtlety is that on gfx10 in WGP mode, the L0 cache needs to be invalidated even for LDS." I'm not being precise enough here, I should be checking which mode we are in and possibly which GFX version.
>
> I also want to ask about the export instructions. I made a conservative guess that they would require the L0 cache invalidation, but I am not sure how the output buffer is implemented.
It would probably be easier to just talk through how the architecture works so you can then make updates.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D99128/new/
https://reviews.llvm.org/D99128
More information about the llvm-commits
mailing list