[llvm] [AMDGPU] Set glc bit for nontemporal loads on GFX10/11 (PR #89739)

Sameer Sahasrabuddhe via llvm-commits llvm-commits at lists.llvm.org
Thu Apr 25 01:38:54 PDT 2024


ssahasra wrote:

>From what I understand, HIT_EVICT means that an existing line is allowed to match, while MISS_EVICT means that no match is allowed. Why would HIT_EVICT be preferred for a non-temporal operation? Is it not natural to believe that non-temporal operations expect fresh data every time, because those values do not have temporal locality?

I am not sure if this perfectly matches streaming of data. But the RCCL low-latency protocols are indeed performance dependent on always getting latest values on L2. The consumer thread is polling this 128-bit location for fresh flag, and hitting an existing line in L1 just causes the polling to continue when in fact there are fresh flags in L2.

So depending on what "non-temporal" means to different use-cases, either HIT_EVICT on L1 needs to be MISS_EVICT in general for non-temporal data, or the specific "low-latency protocol" use case needs a new kind of builtin to set that policy.

https://github.com/llvm/llvm-project/pull/89739


More information about the llvm-commits mailing list