[PATCH] D89618: [AMDGPU] Optimize waitcnt insertion for flat memory operations

Tue Oct 20 01:24:28 PDT 2020

t-tye added a comment.

In D89618#2341101 <https://reviews.llvm.org/D89618#2341101>, @rampitec wrote:

> In D89618#2341074 <https://reviews.llvm.org/D89618#2341074>, @t-tye wrote:
>
>> In D89618#2340966 <https://reviews.llvm.org/D89618#2340966>, @rampitec wrote:
>>
>>> JFYI how much it will help actual programs after it is fixed is unclear. It will likely change a lot of lit tests, but actual effect on real programs would depend on FE and language rules. And inlining of course, as usual.
>>
>> It did change 46 lit tests. I agree it is unclear how much it will help. But the GLOBAL and SCRATCH flat operations seem like they may avoid the pessimistic waitcnt 0.
>
> Right. Out of these 46 lit tests I was looking for a very specific one, wanting to ask to write one if it does not exist. This one does exist and it is failing.

Which test is failing? All the lit tests are passing on my machine. Or are you questioning the way the CHECK tests have been updated? The original test is marking the FLAT pointer as referencing the GLOBAL address space. I assume this is what the frontend did to match the CUDA language semantics that say kernel arguments can only reference global memory. So I believe the generated code is correct unless I am missing something.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89618/new/

https://reviews.llvm.org/D89618